Part 7

Data processing

Reading CSV files

CSV is such a simple format that so far we have accessed the with hand-written code. There is, however, a ready-made module in the Python standard library for working with CSV files: csv. It works like this:

import csv

with open("test.csv") as my_file:
    for line in csv.reader(my_file, delimiter=";"):
        print(line)

The above code reads all lines in the CSV file test.csv, separates the contents of each line into a list using the delimiter ;, and prints each list. So, assuming the contents of the line are as follows:

012121212;5
012345678;2
015151515;4

The code would print out this:

Sample output

['012121212', '5'] ['012345678', '2'] ['015151515', '4']

Since the CSV format is so simple, what's the use of having a separate module when we can just as well use the split function? Well, for one, the way the module is built, it will also work correctly if the values in the file are strings, which may also contain the delimiter character. If some line in the file looked like this

"aaa;bbb";"ccc;ddd"

the above code would produce this:

Sample output

['aaa;bbb', 'ccc;ddd']

Using the split function would also split within the strings, which would likely break the data, and our program in the process.

Reading JSON files

CSV is just one of many machine-readable data formats. JSON is another, and it is used often when data has to be transferred between applications.

JSON files are text files with a strict format, which is perhaps a little less accessible to the human eye than the CSV format. The following example uses the file courses.json, which contains information about some courses:

[
    {
        "name": "Introduction to Programming",
        "abbreviation": "ItP",
        "periods": [1, 3]
    },
    {
        "name": "Advanced Course in Programming",
        "abbreviation": "ACiP",
        "periods": [2, 4]
    },
    {
        "name": "Database Application",
        "abbreviation": "DbApp",
        "periods": [1, 2, 3, 4]
    }
]

The structure of a JSON file might look quite familiar to you by know. The JSON file above looks exactly like a Python list, which contains three Python dictionaries.

The standard library has a module for working with JSON files: json. The function loads takes any argument passed in a JSON format and transforms it into a Python data structure. So, processing the courses.json file with the code below

import json

with open("courses.json") as my_file:
    data = my_file.read()

courses = json.loads(data)
print(courses)

would print out the following:

Sample output

[{'name': 'Introduction to Programming', 'abbreviation': 'ItP', 'periods': [1, 3]}, {'name': 'Advanced Course in Programming', 'abbreviation': 'ACiP', 'periods': [2, 4]}, {'name': 'Database Application', 'abbreviation': 'DbApp', 'periods': [1, 2, 3, 4]}]

If we also wanted to print out the name of each course, we could expand our program with a for loop:

for course in courses:
    print(course["name"])
Sample output

Introduction to Programming Advanced Course in Programming Database Application

Loading

Retrieving a file from the internet

The Python standard library also contains modules for dealing with online content, and one useful function is urllib.request.urlopen. You are encouraged to have a look at the entire module, but the following example should be enough for you to get to grips with the function. It can be used to retrieve content from the internet, so it can be processed in your programs.

The following code would print out the contents of the University of Helsinki front page:

import urllib.request

my_request = urllib.request.urlopen("https://helsinki.fi")
print(my_request.read())

Pages intended for human eyes do not usually look very pretty when their code is printed out. In the following examples, however, we will work with machine-readable data from an online source. Much of the machine-readable data available online is in JSON format.

Loading
Loading
Loading

Looking for modules

The official Python documentation contains information on all modules available in the standard library:

In addition to the standard library, the internet is full of freely available Python modules for different purposes. Some commonly used modules are listed here:

Loading
You have reached the end of this section! Continue to the next section:

You can check your current points from the blue blob in the bottom-right corner of the page.