TDM 30100: Project 3 — 2022

Motivation: Documentation is one of the most critical parts of a project. There are so many tools that are specifically designed to help document a project, and each have their own set of pros and cons. Depending on the scope and scale of the project, different tools will be more or less appropriate. For documenting Python code, however, you can’t go wrong with tools like Sphinx, or pdoc.

Context: This is the second project in a 2-project series where we explore thoroughly documenting Python code, while solving data-driven problems.

Scope: Python, documentation

Learning Objectives
  • Use Sphinx to document a set of Python code.

  • Use pdoc to document a set of Python code.

  • Write and use code that serializes and deserializes data.

  • Learn the pros and cons of various serialization formats.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /anvil/projects/tdm/data/apple/health/watch_dump.xml

Questions

Please use Firefox for this project. We can’t guarantee good results if you do not.

Before you begin, open Firefox, and where you would normally put a URL, type the following, followed by enter/return.

about:config

Search for "network.cookie.sameSite.laxByDefault", and change the value to false, and close the tab.

Question 1

  1. Create a new directory in your $HOME directory called project03: $HOME/project03

  2. Create a new Jupyter notebook in that folder called project03.ipynb, based on the normal project template: $HOME/project03/project03.ipynb

    The majority of this notebook will just contain a single bash cell with the commands used to re-generate the documentation. This is okay, and by design. The main deliverable for this project will end up being some output from the documentation generator — this will be explicitly specified as we go along and at the end of the project.

  3. Create a module called firstname_lastname_project03.py in your $HOME/project03 directory, with the following contents.

    """This module is for project 3 for TDM 30100.
    
    **Serialization:** Serialization is the process of taking a set or subset of data and transforming it into a specific file format that is designed for transmission over a network, storage, or some other specific use-case.
    
    **Deserialization:** Deserialization is the opposite process from serialization where the serialized data is reverted back into its original form.
    
    The following are some common serialization formats:
    
    - JSON
    - Bincode
    - MessagePack
    - YAML
    - TOML
    - Pickle
    - BSON
    - CBOR
    - Parquet
    - XML
    - Protobuf
    
    **JSON:** One of the more wide-spread serialization formats, JSON has the advantages that it is human readable, and has a excellent set of optimized tools written to serialize and deserialize. In addition, it has first-rate support in browsers. A disadvantage is that it is not a fantastic format storage-wise (it takes up lots of space), and parsing large JSON files can use a lot of memory.
    
    **MessagePack:** MessagePack is a non-human-readable file format (binary) that is extremely fast to serialize and deserialize, and is extremely efficient space-wise. It has excellent tooling in many different languages. It is still not the *most* space efficient, or *fastest* to serialize/deserialize, and remains impossible to work with in its serialized form.
    
    Generally, each format is either *human-readable* or *not*. Human readable formats are able to be read by a human when opened up in a text editor, for example. Non human-readable formats are typically in some binary format and will look like random nonsense when opened in a text editor.
    """
    
    
    import lxml
    import lxml.etree
    from datetime import datetime, date
    
    
    def get_records_for_date(tree: lxml.etree._ElementTree, for_date: date) -> list:
        """
        Given an `lxml.etree` object and a `datetime.date` object, return a list of records
        with the startDate equal to `for_date`.
        Args:
            tree (lxml.etree): The watch_dump.xml file as an `lxml.etree` object.
            for_date (datetime.date): The date for which returned records should have a startDate equal to.
        Raises:
            TypeError: If `tree` is not an `lxml.etree` object.
            TypeError: If `for_date` is not a `datetime.date` object.
        Returns:
            list: A list of records with the startDate equal to `for_date`.
        """
    
        if not isinstance(tree, lxml.etree._ElementTree):
            raise TypeError('tree must be an lxml.etree')
    
        if not isinstance(for_date, date):
            raise TypeError('for_date must be a datetime.date')
    
        results = []
        for record in tree.xpath('/HealthData/Record'):
            if for_date == datetime.strptime(record.attrib.get('startDate'), '%Y-%m-%d %X %z').date():
                results.append(record)
    
        return results

    Make sure you change "firstname" and "lastname" to your first and last name.

  4. In a bash cell in your project03.ipynb notebook, run the following.

    %%bash
    
    cd $HOME/project03
    python3 -m sphinx.cmd.quickstart ./docs -q -p project03 -a "Firstname Lastname" -v 1.0.0 --sep

    Please replace "Firstname" and "Lastname" with your own name.

    What do all of these arguments do? Check out this page of the official documentation.

You should be left with a newly created docs directory within your project03 directory: $HOME/project03/docs. The directory structure should look similar to the following.

contents
project03(1)
├── 39000_f2021_project03_solutions.ipynb(2)
├── docs(3)
│   ├── build (4)
│   ├── make.bat
│   ├── Makefile (5)
│   └── source (6)
│       ├── conf.py (7)
│       ├── index.rst (8)
│       ├── _static
│       └── _templates
└── kevin_amstutz_project03.py(9)

5 directories, 6 files
1 Our module (named project03) folder
2 Your project notebook (probably named something like firstname_lastname_project03.ipynb)
3 Your documentation folder
4 Your empty build folder where generated documentation will be stored
5 The Makefile used to run the commands that generate your documentation
6 Your source folder. This folder contains all hand-typed documentation
7 Your conf.py file. This file contains the configuration for your documentation.
8 Your index.rst file. This file (and all files ending in .rst) is written in reStructuredText — a Markdown-like syntax.
9 Your module. This is the module containing the code from the previous project, with nice, clean docstrings.

Please make the following modifications:

  1. To Makefile:

    # replace
    SPHINXOPTS    ?=
    SPHINXBUILD   ?= sphinx-build
    SOURCEDIR     = source
    BUILDDIR      = build
    
    # with the following
    SPHINXOPTS    ?=
    SPHINXBUILD   ?= python3 -m sphinx.cmd.build
    SOURCEDIR     = source
    BUILDDIR      = build
  2. To conf.py:

    # CHANGE THE FOLLOWING CONTENT FROM:
    
    # -- Path setup --------------------------------------------------------------
    
    # If extensions (or modules to document with autodoc) are in another directory,
    # add these directories to sys.path here. If the directory is relative to the
    # documentation root, use os.path.abspath to make it absolute, like shown here.
    #
    # import os
    # import sys
    # sys.path.insert(0, os.path.abspath('.')
    
    # TO:
    
    # -- Path setup --------------------------------------------------------------
    
    # If extensions (or modules to document with autodoc) are in another directory,
    # add these directories to sys.path here. If the directory is relative to the
    # documentation root, use os.path.abspath to make it absolute, like shown here.
    #
    import os
    import sys
    sys.path.insert(0, os.path.abspath('../..'))

Finally, with the modifications above having been made, run the following command in a bash cell in Jupyter notebook to generate your documentation.

cd $HOME/project03/docs
make html

After complete, your module folders structure should look something like the following.

structure
project03
├── 39000_f2021_project03_solutions.ipynb
├── docs
│   ├── build
│   │   ├── doctrees
│   │   │   ├── environment.pickle
│   │   │   └── index.doctree
│   │   └── html
│   │       ├── genindex.html
│   │       ├── index.html
│   │       ├── objects.inv
│   │       ├── search.html
│   │       ├── searchindex.js
│   │       ├── _sources
│   │       │   └── index.rst.txt
│   │       └── _static
│   │           ├── alabaster.css
│   │           ├── basic.css
│   │           ├── custom.css
│   │           ├── doctools.js
│   │           ├── documentation_options.js
│   │           ├── file.png
│   │           ├── jquery-3.5.1.js
│   │           ├── jquery.js
│   │           ├── language_data.js
│   │           ├── minus.png
│   │           ├── plus.png
│   │           ├── pygments.css
│   │           ├── searchtools.js
│   │           ├── underscore-1.13.1.js
│   │           └── underscore.js
│   ├── make.bat
│   ├── Makefile
│   └── source
│       ├── conf.py
│       ├── index.rst
│       ├── _static
│       └── _templates
└── kevin_amstutz_project03.py

9 directories, 29 files

Finally, let’s take a look at the results! In the left-hand pane in the Jupyter Lab interface, navigate to $HOME/project03/docs/build/html/, and right click on the index.html file and choose Open in New Browser Tab. You should now be able to see your documentation in a new tab. It should look something like the following.

Resulting Sphinx output
Figure 1. Resulting Sphinx output

Make sure you are able to generate the documentation before you proceed, otherwise, you will not be able to continue to modify, regenerate, and view your documentation.

Items to submit
  • Code used to solve this problem (in 2 Jupyter bash cells).

Question 2

One of the most important documents in any package or project is the README.md file. This file is so important that version control companies like GitHub and GitLab will automatically display it below the repositories contents. This file contains things like instructions on how to install the packages, usage examples, lists of dependencies, license links, etc. Check out some popular GitHub repositories for projects like numpy, pytorch, or any other repository you’ve come across that you believe does a good job explaining the project.

In the docs/source folder, create a new file called README.rst. Choose 3-5 of the following "types" of reStruturedText from the this webpage, and create a fake README. The content can be Lorem Ipsum type of content as long as it demonstrates 3-5 of the types of reStruturedText.

  • Inline markup

  • Lists and quote-like blocks

  • Literal blocks

  • Doctest blocks

  • Tables

  • Hyperlinks

  • Sections

  • Field lists

  • Roles

  • Images

  • Footnotes

  • Citations

  • Etc.

Make sure to include at least 1 section. This counts as 1 of your 3-5.

Once complete, add a reference to your README to the index.rst file. To add a reference to your README.rst file, open the index.rst file in an editor and add "README" as follows.

index.rst
.. project3 documentation master file, created by
   sphinx-quickstart on Wed Sep  1 09:38:12 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to project3's documentation!
====================================

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   README

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

Make sure "README" is aligned with ":caption:" — it should be 3 spaces from the left before the "R" in "README".

In a new bash cell in your notebook, regenerate your documentation.

%%bash

cd $HOME/project03/docs
make html

Check out the resulting index.html page, and click on the links. Pretty great!

Things should look similar to the following images.

Sphinx output
Figure 2. Sphinx output
Sphinx output
Figure 3. Sphinx output
Items to submit

Question 3

The pdoc package was specifically designed to generate documentation for Python modules using the docstrings in the module. As you may have noticed, this is not "native" to Sphinx.

Sphinx has extensions. One such extension is the autodoc extension. This extension provides the same sort of functionality that pdoc provides natively.

To use this extension, modify the conf.py file in the docs/source folder.

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
    'sphinx.ext.autodoc'
]

Next, update your index.rst file so autodoc knows which modules to extract data from.

.. project3 documentation master file, created by
   sphinx-quickstart on Wed Sep  1 09:38:12 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to project3's documentation!
====================================

.. automodule:: firstname_lastname_project03
    :members:

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   README

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

In a new bash cell in your notebook, regenerate your documentation. Check out the resulting index.html page, and click on the links. Not too bad!

Items to submit

Question 4

Okay, while the documentation looks pretty good, clearly, Sphinx does not recognize Google style docstrings. As you may have guessed, there is an extension for that.

Add the napoleon extension to your conf.py file.

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.napoleon'
]

In a new bash cell in your notebook, regenerate your documentation. Check out the resulting index.html page, and click on the links. Much better!

Items to submit

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.