Working with multiple files

Day 18 Project: JSON Reading List

Welcome to the day 18 project in the 30 Days of Python series! Today we're going to be modifying our reading list application one final time to practice a really important module called json.

The json module is used for serialising and deserialising JSON data. This is really just a fancy way of saying it provides a means of translating our Python code to the JSON format, and from JSON back to Python.

Before we get to the brief, let's talk a little bit about what the JSON format looks like, and the important functions we're going to be making use of in the json module.

JSON

JSON stands for JavaScript Object Notation, and like CSV it's an incredibly common format for storing data. It's very widely used on the web, particularly for sharing information between websites or apps and a server.

The JSON format is actually going to look very familiar to us, because it's almost identical to that of Python dictionaries and lists.

Here is an example of a JSON object, which is roughly analogous to a dictionary:

{
    "title": "1Q84",
    "author": "Haruki Murakami",
    "year": 2009,
    "read": true
}

There are a few things we have to be careful to remember with JSON. The first is that all strings, which includes key names, must use double quotes ("). Single quotes (') are not valid in JSON.

The second thing we have to take note of is the keyword true. This is entirely lowercase in JSON, unlike in Python where we use True. In JSON false is also a valid Boolean value, which is the same as the Python False.

If we want to have several independent JSON objects, which is the case in our project, we need to wrap the objects in a JSON array, which is analogous to a Python list.

[
    {
        "title": "1Q84",
        "author": "Haruki Murakami",
        "year": 2009,
        "read": true
    },
    {
        "title": "The Picture of Dorian Gray",
        "author": "Oscar Wilde",
        "year": 1890,
        "read": false 
    }
]

For JSON to be valid, we must always have either a single top-level JSON object, or a single top-level JSON array. This means that an empty file is not valid JSON.

We can nest these structures as deeply as we like, so it's perfectly legal to have an array of objects that contain arrays of objects filled with other objects. It's not uncommon to see JSON in this very complicated format either.

While it would be a little difficult, we could write code to go through this JSON data and convert it to Python, but thankfully we don't have to do this ourselves. It's such a common operation that we have a json module for exactly this purpose in the standard library.

The `json` module

There are two important functions we need to be aware of in the json module.

The first of these functions is json.load, which is going to allow us to take JSON data from a file and convert it to standard Python types, like dictionaries, lists, integers, strings, etc.

This function requires that the file is in valid JSON format, so we have to be careful not to violate any of the rules I mentioned before. That includes not trying to read an empty file!

We're going to use it like this:

with open("books.json", "r") as reading_list:
    books = json.load(reading_list)

The second of the functions we need to learn about is json.dump. This function expects a dictionary or list, as well as a file to write to.

When working with dump, we're going to be writing to the file in write mode ("w"), which will truncate the file. We're then going to write a new JSON array to the file which contains all of the books in the library.

The reason we're going to take this approach is because we have to have a single top-level array or object in our file to satisfy the JSON specification, so we can't simply append new objects to the existing file content.

Don't worry. While this may seem wasteful, it's going to be a near-instantaneous operation for even extremely large amounts of data. By time performance became a problem, we should probably have already moved onto something like a database for storage instead.

Here is an example of how we're going to use the dump function:

with open("books.json", "w") as reading_list:
    json.dump(books, reading_list)  # books is the list of books we want to write

Note

There's also a version of dump and load called dumps and loads, but these do not do the same thing.

The "s" in the function names refers to strings, because these functions either take in JSON as a string, or give us a string representation of the code we've converted to JSON. You can find plenty of examples in the json module documentation.

The brief

The starting point for this project is going to be the code we wrote in day 14. If you haven't yet attempted the harder version of the day 14 project, I'd recommend giving that a go before continuing here, as you'll end up with an application with much more impressive functionality.

The task for this project is relatively simple. We need to update the implementation of our reading list application so that our data is stored in a new file called books.json. We'll be retrieving all of our book data from this new file as well.

If you're starting a new repl or workspace for this project, there's no need to copy across books.csv as we won't be using it any longer.

Since an empty file is not valid JSON, I'd recommend creating your books.json file with some empty square brackets inside. This is an empty JSON array, which is perfectly valid, and we can populate this JSON array with JSON objects. Each object is going to represent a different book, and you can think of the file structure as something like a list of dictionaries.

During the project walkthrough, I'm going to show you a way to take care of creating the file programmatically, so that we don't need to have a books.json file in advance. We'll be learning about the tools we need to do this tomorrow, so this is not something you have to take care of yourself.

As I mentioned in the JSON introduction above, we're going to have to completely replace the contents of the books.json file whenever we make changes, because simply appending new JSON objects will quickly lead to a file which doesn't comply with the JSON specification. Rather than accounting for this, it's much easier to simply replace the entire file.

Good luck!

Our solution

Our solution is all below, or you can watch the video version of the walkthrough.

I'm going to be starting off with the code from harder version of the project, which looks like this:

def add_book():
    title = input("Title: ").strip().title()
    author = input("Author: ").strip().title()
    year = input("Year of publication: ").strip()

    with open("books.csv", "a") as reading_list:
        reading_list.write(f"{title},{author},{year},Not Read\\\\n")

def delete_book(books, book_to_delete):
    books.remove(book_to_delete)

def find_books():
    reading_list = get_all_books()
    matching_books = []

    search_term = input("Please enter a book title: ").strip().lower()

    for book in reading_list:
        if search_term in book["title"].lower():
            matching_books.append(book)

    return matching_books

# Helper function for retrieving data from the csv file
def get_all_books():
    books = []

    with open("books.csv", "r") as reading_list:
        for book in reading_list:
            # Extracts the values from the CSV data
            title, author, year, read_status = book.strip().split(",")

            # Creates a dictionary from the csv data and adds it to the books list
            books.append({
                "title": title,
                "author": author,
                "year": year,
                "read": read_status
            })

    return books

def mark_book_as_read(books, book_to_update):
    index = books.index(book_to_update)
    books[index]['read'] = "Read"

def update_reading_list(operation):
    books = get_all_books()
    matching_books = find_books()

    if matching_books:
        operation(books, matching_books[0])

        with open("books.csv", "w") as reading_list:
            for book in books:
                reading_list.write(f"{book['title']},{book['author']},{book['year']},{book['read']}\\\\n")
    else:
        print("Sorry, we didn't find any books matching that title.")

def show_books(books):
    # Adds an empty line before the output
    print()

    for book in books:
        print(f"{book['title']}, by {book['author']} ({book['year']}) - {book['read']}")

    print()

menu_prompt = """Please enter one of the following options:

- 'a' to add a book
- 'd' to delete a book
- 'l' to list the books
- 'r' to mark a book as read
- 's' to search for a book
- 'q' to quit

What would you like to do? """

# Get a selection from the user
selected_option = input(menu_prompt).strip().lower()

# Run the loop until the user selected 'q'
while selected_option != "q":
    if selected_option == "a":
        add_book()
    elif selected_option == "d":
        update_reading_list(delete_book)
    elif selected_option == "l":
        # Retrieves the whole reading list for printing
        reading_list = get_all_books()

        # Check that reading_list contains at least one book
        if reading_list:
            show_books(reading_list)
        else:
            print("Your reading list is empty.")
    elif selected_option == "r":
        update_reading_list(mark_book_as_read)
    elif selected_option == "s":
        matching_books = find_books()

        # Checks that the seach returned at least one book
        if matching_books:
            show_books(matching_books)
        else:
            print("Sorry, we didn't find any books for that search term")
    else:
        print(f"Sorry, '{selected_option}' isn't a valid option.")

    # Allow the user to change their selection at the end of each iteration
    selected_option = input(menu_prompt).strip().lower()

There's a fair bit of code here, but take your time to go through it. I'd recommend looking at the project walkthrough from day 14 if you're having difficulty understanding anything.

A great deal of this program is going to end up being exactly the same. The functions we need to change in our current implementation are:

add_book
get_all_books
update_reading_list

None of our other functions work with the file at all, so their implementation is going to remain unchanged. The same goes for our menu code, which isn't tied in way to our method of storage. It simple presents an interface to the user and delegates functionality to our functions.

Let's start by importing the json module at the top of the file, since we're going to need to use it in several places.

import json

With json imported, we can start modifying get_all_books. This is a good place to start since we rely on this function in a few different places, and we're going to need it for our new add_book implementation as well.

def get_all_books():
    books = []

    with open("books.csv", "r") as reading_list:
        for book in reading_list:
            # Extracts the values from the CSV data
            title, author, year, read_status = book.strip().split(",")

            # Creates a dictionary from the csv data and adds it to the books list
            books.append({
                "title": title,
                "author": author,
                "year": year,
                "read": read_status
            })

    return books

At the moment we read our old books.csv file; we process each line of the file; construct a dictionary; and append it to the books list. We then return this list of dictionaries for use elsewhere in our application.

Our new implementation is actually going to be a great deal simpler, because we no longer need to process any of the data ourselves. The json.load function is going to be able to see that we have a JSON array, potentially populated with JSON objects, and it is going to give us back a list of dictionaries corresponding to that data.

def get_all_books():
    with open("books.json", "r") as reading_list:
        books = json.load(reading_list)

    return books

This demonstrates how powerful using other modules can be. Parsing JSON is not a terribly simple task, and here we accomplished it all on one line.

For reference, there is also a csv module would have allowed us to do something very similar when working with CSV data. We have a short video covering how to use this here.

We can make our get_all_books function even shorter as well, because we don't really need this books variable. We can return the list of dictionaries from within the context manager.

def get_all_books():
    with open("books.json", "r") as reading_list:
        return json.load(reading_list)

Now let's turn to the add_book. This function is going to see some fairly significant changes, because we can no longer just write an extra line to our file. We need to first get hold of the current file contents, add an extra record, and then write the new contents back to the file.

Luckily we have a function to grab the contents of a file already: the newly updated get_all_books function. Let's call this right at the top of add_book. We can then put our original prompts directly after, since we still need those.

def add_book():
    books = get_all_books()

    title = input("Title: ").strip().title()
    author = input("Author: ").strip().title()
    year = input("Year of publication: ").strip()

Our next step is going to be constructing a dictionary that we can insert into our books list. We can create this as part of calling the append method like so:

def add_book():
    books = get_all_books()

    title = input("Title: ").strip().title()
    author = input("Author: ").strip().title()
    year = input("Year of publication: ").strip()

    books.append({
        "title": title,
        "author": author,
        "year": year,
        "read": "Not read"
    })

Now that we have the new record in our books list, we can just write the whole books list to the JSON file. By opening the file in write mode, we can truncate the original file contents and completely replace it with our new data.

In order to write the data, we're going to use json.dump.

def add_book():
    books = get_all_books()

    title = input("Title: ").strip().title()
    author = input("Author: ").strip().title()
    year = input("Year of publication: ").strip()

    books.append({
        "title": title,
        "author": author,
        "year": year,
        "read": "Not read"
    })

    with open("books.json", "w") as reading_list:
        json.dump(books, reading_list)

Now we can move onto the last of our modifications: the update_reading_list function.

def update_reading_list(operation):
    books = get_all_books()
    matching_books = find_books()

    if matching_books:
        operation(books, matching_books[0])

        with open("books.csv", "w") as reading_list:
            for book in books:
                reading_list.write(f"{book['title']},{book['author']},{book['year']},{book['read']}\\\\n")
    else:
        print("Sorry, we didn't find any books matching that title.")

Here we're actually going to be able to make another simplification, because we no longer need to manually format what we're writing to the file. It's enough that we just use dump to translate the books list into JSON.

def update_reading_list(operation):
    books = get_all_books()
    matching_books = find_books()

    if matching_books:
        operation(books, matching_books[0])

        with open("books.json", "w") as reading_list:
            json.dump(books, reading_list)
    else:
        print("Sorry, we didn't find any books matching that title.")

With that, we're done with all of our required changes, but here are a few other things we can do.

First, we can use our new ** unpacking syntax to shorten our show_books function. At the moment we have this:

def show_books(books):
    # Adds an empty line before the output
    print()

    for book in books:
        print(f"{book['title']}, by {book['author']} ({book['year']}) - {book['read']}")

    print()

It's a little verbose having to write a subscription expression for every one of the values here, and we actually have a couple of alternatives.

First, we can unpack book into several variables and use those like this:

def show_books(books):
    # Adds an empty line before the output
    print()

    for book in books:
        title, author, year, read = book.values()
        print(f"{title}, by {author} ({year}) - {read}")

    print()

That's pretty nice, but we can avoid defining all of these variables in advance by using named placeholders with format. If you need a refresher, we covered this all the way back in day 3.

Remember that using ** to unpack a dictionary gives us a series of keyword arguments, and this is exactly how we assign values to named placeholders using format. We can therefore do this:

def show_books(books):
    # Adds an empty line before the output
    print()

    for book in books:
        print("{title}, by {author} ({year}) - {read}".format(**book))

    print()

Another thing we can do to upgrade the code is to put in some kind of check to ensure the file exists. This is peeking ahead to tomorrow's content a little bit, but it should be fairly easy to understand.

We're going to create a function like this:

def create_book_file():
    try:
        with open("books.json", "x") as reading_list:
            json.dump([], reading_list)
    except FileExistsError:
        pass

Here we're saying that Python should attempt to create a file called books.json using "x" mode, which means exclusive creation. If this file already exists open is going to raise a special exception called FileExistsError. Using an except clause, we're telling Python to listen for this error, and if it encounters it while running the code above, it should ignore it.

Now we just have to call create_book_file when we start our program.

With that, we're done! The complete code can be found below:

import json

def add_book():
    books = get_all_books()

    title = input("Title: ").strip().title()
    author = input("Author: ").strip().title()
    year = input("Year of publication: ").strip()

    books.append({
        "title": title,
        "author": author,
        "year": year,
        "read": "Not read"
    })

    with open("books.json", "w") as reading_list:
        json.dump(books, reading_list)

def create_book_file():
    try:
        with open("books.json", "x") as reading_list:
            json.dump([], reading_list)
    except FileExistsError:
        pass

def delete_book(books, book_to_delete):
    books.remove(book_to_delete)

def find_books():
    reading_list = get_all_books()
    matching_books = []

    search_term = input("Please enter a book title: ").strip().lower()

    for book in reading_list:
        if search_term in book["title"].lower():
            matching_books.append(book)

    return matching_books

# Helper function for retrieving data from the csv file
def get_all_books():
    with open("books.json", "r") as reading_list:
        return json.load(reading_list)

def mark_book_as_read(books, book_to_update):
    index = books.index(book_to_update)
    books[index]['read'] = "Read"

def update_reading_list(operation):
    books = get_all_books()
    matching_books = find_books()

    if matching_books:
        operation(books, matching_books[0])

        with open("books.json", "w") as reading_list:
            json.dump(books, reading_list)
    else:
        print("Sorry, we didn't find any books matching that title.")

def show_books(books):
    # Adds an empty line before the output
    print()

    for book in books:
        print("{title}, by {author} ({year}) - {read}".format(**book))

    print()

create_book_file()

menu_prompt = """Please enter one of the following options:

- 'a' to add a book
- 'd' to delete a book
- 'l' to list the books
- 'r' to mark a book as read
- 's' to search for a book
- 'q' to quit

What would you like to do? """

# Get a selection from the user
selected_option = input(menu_prompt).strip().lower()

# Run the loop until the user selected 'q'
while selected_option != "q":
    if selected_option == "a":
        add_book()
    elif selected_option == "d":
        update_reading_list(delete_book)
    elif selected_option == "l":
        # Retrieves the whole reading list for printing
        reading_list = get_all_books()

        # Check that reading_list contains at least one book
        if reading_list:
            show_books(reading_list)
        else:
            print("Your reading list is empty.")
    elif selected_option == "r":
        update_reading_list(mark_book_as_read)
    elif selected_option == "s":
        matching_books = find_books()

        # Checks that the seach returned at least one book
        if matching_books:
            show_books(matching_books)
        else:
            print("Sorry, we didn't find any books for that search term")
    else:
        print(f"Sorry, '{selected_option}' isn't a valid option.")

    # Allow the user to change their selection at the end of each iteration
    selected_option = input(menu_prompt).strip().lower()

As always, I'd encourage you to keep working on the project to see how you can improve it. That may mean adding further functionality, making the code simpler, or making the existing functionality more robust.

See what you can come up with. Happy coding!