Chalmers Advanced Python

Lab 1: Information extraction

Advanced Python Course
Chalmers DAT690 / DIT516 / DAT516
2025

by Aarne Ranta & John J. Camilleri

NEWS

2025-11-13: Added a better example and explanation to the time dictionary specification. The specification itself has not changed, but it is hopefully clearer now. If you already have your lab working well, you don’t need to do anything. But as several students have asked about it, I thought it would be useful to improve the explanation.

Purpose

The purpose of Lab 1 is to read information from different formats and combine it to useful data structures. We will consider two different data formats:

The data collected from these files is saved in a new JSON file, tramnetwork.json, which is ready to be used in applications - including Labs 2 and 3. The command python3 tramdata.py init produces this file.

The target data structures are dictionaries, which enable efficient queries about the data. If run with the command python3 tramdata.py, the program will enable the following kind of dialogue:

$ python3 tramdata.py
> via Chalmers
['6', '7', '8', '10', '13']
> between Chalmers and Valand
['7', '10']
> time with 6 from Chalmers to Järntorget
10
> distance from Chalmers to Järntorget
1.628

These structures and queries are preparation for the later labs, where they are embedded in an object-oriented hierarchy (Lab 2) and used in the back-end of a web application (Lab 3).

Learning outcomes:

Task

The task is to write three functions that build dictionaries, four functions that extract information from them, and a dialogue function that answers to queries. The dialogue function should be divided into two parts to enable more accurate testing and debugging.

Dictionary building functions

build_tram_stops(jsonobject), building a stop dictionary, where

Here is a part of the stop dictionary, showing just one stop:

{
  'Majvallen': {
    'lat': 57.6909343,
    'lon': 11.9354935
  }
}

An input file in the expected format is tramstops.json. The function involves an easy conversion using the json library.

build_tram_lines(lines), building a line dictionary, where

Here is an example:

{
  "9": [
    "Angered Centrum",
    "Storås",
    "Hammarkullen",
    # many more stops in between
    "Sandarna",
    "Kungssten"
  ]
}

An input file in the expected format is tramlines.txt. It is a textual representation of timetables for each line, looking as follows:

1:
Östra Sjukhuset           10:00
Tingvallsvägen            10:01
Kaggeledstorget           10:03
Ättehögsgatan             10:03

Thus, for each tram line, there is a section starting with the line number and a colon. After that, the stops are given together with times. For simplicity, each line starts from time 10:00. We are not interested in these times as such, but in the transition times between adjacent stops. Thus, for instance, the transition time between Tingvallsvägen and Kaggeledstorget is 2 minutes. We want to store the transition times in a non-redundant way, under the following assumptions:

Hence, we don’t want to add transition times to the line dictionary, because this would lead to storing redundant information. Instead, from the file tramlines.txt, we also build a time dictionary which stores the times between adjacent stops, where

Here is an example of a time dictionary entry:

{
  "Kaggeledstorget" : {
    "Tingvallsvägen": 2,
    "Ättehögsgatan": 0
  }
}

To summarize, the general idea with these data structures and functions is to avoid redundancy: every piece of information is given only once in the dictionaries. In particular,

Hint (not necessary to follow, but may be useful): A way to enforce the latter condition is to use alphabetical order: the time dictionary of Kaggeledstorget includes Tingvallsvägen, but not the other way round. When you then need to look up the time from Tingvallsvägen to Kaggeledstorget, you can find it by first looking up Kaggeledstorget.

Moreover, you should aim at the following:

build_tram_network(stopfile, linefile) puts everything together. It reads the two input files and writes a third one, entitled tramnetwork.json. This JSON file represents a dictionary that contains the three dictionaries built:

{
  "stops": {
    "Östra Sjukhuset": {
      "lat": 57.7224618,
      "lon": 12.0478166
    },  // and so on, the entire stop dict
  },
  "lines": {
    "1": [
      "Östra Sjukhuset",
      "Tingvallsvägen",
      // and so on, all stops on line 1
    ],  // and so on, the entire line dict
  },
  "times": {
    "Tingvallsvägen": {
      "Kaggeledstorget": 2
    },  // and so on, the entire time dict
  }
}

Query functions

Each of the following functions uses one or more of the dictionaries you built.

lines_via_stop(linedict, stop) lists the lines that go via the given stop. The lines should be sorted in their numeric order, that is, ‘2’ before ‘10’.

lines_between_stops(linedict, stop1, stop2) lists the lines that go from stop1 to stop2. The lines should be sorted in their numeric order, that is, ‘2’ before ‘10’. Notice that all lines are assumed to run in both directions.

time_between_stops(linedict, timedict, line, stop1, stop2) calculates the time from stop1 to stop2 along the given line. This is obtained as the sum of all times between adjacent stops. If the stops are not along the same line, an error message is printed.

distance_between_stops(stopdict, stop1, stop2) calculates the geographic distance between any two stops, based on their latitude and longitude. The distance is hence not dependent on the tram lines. You can implement this function by using the Haversine library.

Tests for dictionary building & querying

Testing will be addressed more systematically in Lab 2 and also be a part of it. However, you can already train your hand at writing tests, because it is a great help in developing your code. The file templates/test_tramdata.py already tests if all stops associated with lines in linedict also exist in stopdict. You could try and add at least the following tests:

The dialogue function

The dialogue(tramfile) function implements a dialogue about tram information. It starts by reading the data from the JSON file tramnetwork.json, which has been produced by your program. Then it takes user input and answers to any number of questions by using your query functions. Following kinds of input are interpreted:

The main challenge is to deal correctly with stop names that consist of more than one word. A hint for this is to locate the positions of keywords such as “and”, which can appear between stop names, and consider slices starting or ending at them. The simplest method is the standard index() method of strings. Also the regular expression library re could be used, but is probably more complicated to learn unless you already know it from before.

For the purpose of testing, and more generally to cleanly separate input and output from processing, the dialogue() function should be divided into two separate functions:

Tests for the dialogue

Testing a complete dialogue is tricky, but you can can easily test the answer_query(tramdict, query) function. What you should test is that the answer printed for a query (in the format written by the user) is the same as the expected answer. This then tests that queries are parsed and interpreted correctly.

There is already one example of this in test_tramdata.py, which you should extend with your own test cases. Here are some more examples to get you started:

> via Botaniska Trädgården
['1', '2', '7', '8', '13']
> between Medicinaregatan and Saltholmen
['13']
> time with 5 from Munkebäckstorget to Sankt Sigfrids Plan
9
> distance from Temperaturgatan to Lackarebäck
10.092

Remember to also test negative examples, to ensure that your error handling works correctly, e.g.:

> between Medicinareberget and Saltholmen
unknown arguments
> distance between Chalmers and Ramberget
sorry, try again

The main function

At the end of your file, make a conditional call under

if __name__ == '__main__':

calling build_tram_network() if the argument init is present, dialogue() otherwise.
Hint: You can check the presence of this argument by using sys.argv:

if __name__ == '__main__':
    if sys.argv[1:] == ['init']:
        build_tram_network("tramlines.txt", "tramstops.json")
    else:
        dialogue("tramnetwork.json")

You also need to import sys.

Submission

You should submit the following files:

The submitted code must be usable in the following ways: