STAT 29000: Project 9 — Spring 2022

Motivation: Python is an interpreted language (as opposed to a compiled language). In a compiled language, you are (mostly) unable to run and evaluate a single instruction at a time. In Python (and R — also an interpreted language), we can run and evaluate a line of code easily using a repl. In fact, this is the way you’ve been using Python to date — selecting and running pieces of Python code. Other ways to use Python include creating a package (like numpy, pandas, and pytorch), and creating scripts. You can create powerful CLI’s (command line interface) tools using Python. In this project, we will explore this in detail and learn how to create scripts that accept options and input and perform tasks.

Context: This is the second (of two) projects where we will learn about creating and using Python scripts.

Scope: Python, argparse

Learning Objectives

Write a python script that accepts user inputs and returns something useful.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

/depot/datamine/data/*

Questions

Question 1

Scripts are really powerful. It is cool to think about baking tons of powerful functionality into a single file, sharing it with others, and enabling them to use what you built. When you build a script, CLI (command line interface), or some other Python tool that accepts inputs and options, argparse is really the go to package. It hides away lots of the complex logic that can be involved with programs that have lots of inputs or lots of options. We want to encourage you to use and learn more about this package.

The argparse documentation is a bit rough, and can take some time to navigate, understand, and figure out how to do what you want to do. This is the official documentation, and this is the official argparse tutorial.

The template we provided from the previous project is a pretty great starting point. The upside to the template is that it is flexible enough to handle most scenarios for a given a script. The downside is that it may be more than what is needed for a given script, and as a result, involve an unnecessary extra command (for example something, if you look at the template. Instead of ./myscript something some_other_input you could have ./myscript some_other_input.). The same template can be found below.

import argparse
import pandas as pd

def main():
	parser = argparse.ArgumentParser()
	subparsers = parser.add_subparsers(help="possible commands", dest="command")
	some_parser = subparsers.add_parser("something", help="")
	some_parser.add_argument("-o", "--output", help="directory to output file(s) to")

	if len(sys.argv) == 1:
		parser.print_help()
		sys.exit(1)

	args = parser.parse_args()

	if args.command == "something":
		something()

if __name__ == "__main__":
	main()

At the heart of the template script is the subparsers. This is what is responsible for both the flexibility, and potential to have an extra, not necessary argument. This links to the section in the documentation on subparsers.

Let’s imagine that you were tasked to write a script for a media company. This company does the weather, news, and sports (among other imaginary things). In as much detail as you can muster, explain why using subparsers for such a script would be useful. Assume that this script would fulfill special tasks related to all 3 categoriesof media. Explain why or why not it would be easily possible to bake functionality for all 3 categories of media into a single script without using subparsers. All answers showing a strong effort will receive full credit. If you created any experimental scripts to help understand subparsers, please include those in a code cell in your notebook!

Items to submit

Code used to solve this problem.
Output from running the code.

Question 2

A project based around writing a script is the perfect opportunity to get creative! If you are part of Corporate Partners, you could use this project as a chance to implement something useful for your team! If you aren’t, you could write a script to help automate some task that you find you need to repeat a lot. Some ideas of scripts would be:

Write a script that scrapes recent sports data about one or more teams and prints the output to the screen, neat and formatted (and maybe even colored if you want).
Write a script that processes and cleans up a specifically formatted type of data and inserts it into a database.
Write a script that processes images or text and outputs some sort of summary or analysis (you could use the transformers package we used in the previous project for this! Just make sure the script is unique and not like the script from the previous project).
Write a script that accepts an image and prints out the ascii art version of the image.
Write a script that accepts an image and determines whether or not your face is in the image.
Write a script that solves a sudoku or other puzzle (wordle?).
Write a script that accepts images and creates a collage or mosaic of the images.

All of the examples above have the capability of having any number of flags or options that could slightly change the programs behavior. For example, the ascii art program could have an option that randomly colors the characters, or the collage script may have an option that makes the images black and white, etc.

Write a script, in two stages. The script could be more complicated or less complicated than any/all of the provided examples — we just want a strong effort.

Make sure to document both stages using a combination of code and markdown cells in your jupyter notebook. The readers should be able to read about what your script does, why, and then proceed to look at examples to see it in action.
Make sure to include your (final) script (as a .py file) in your submission.
Make sure you run your script (with all sorts of combinations of flags and options that shows off the capabilities) from code cells in your notebook. The reader should be able to see and understand what your code does based on the output from the code cells. Since your script may use data we do not have access to, the script does not need to work for anyone, but the full capabilities should be demonstrated with the output from code cells.

The first stage should involve building up the first version of your script. As mentioned before, your script could do anything. We only ask that your script meets the following criteria:

The script must have at least 1 optional flag, that, when present, indicates that the script should be run in a different way.
The script must have help text for every argument or flag in the script.
The script must have at least 1 positional, required argument.
The script must have at least 1 optional, not required argument. (Note: This is different than the flag. This argument should accept a value and not just use defaults, like a flag does).
The script must accept both a long or short version of all optional arguments. For example -v and --verbose.

The second stage is to enhance your first version of your script. This could be via arguments and options that are included in argparse, or by using another package like this rich package. This enhancement could be anything, the only requirement is that you explain what the enhancement is and why it is an enhancement.

Remember, you should use our Python environment’s shebang so that all of our packages are ready to use.

#!/scratch/brown/kamstut/tdm/apps/jupyter/kernels/f2021-s2022/.venv/bin/python

Remember, you will probably need to give your script execute permissions before you can run it. If your script is called myscript.py, and it lives in your $HOME directory, you could give execute permissions as follows.

chmod +x $HOME/myscript.py

Items to submit

Code used to solve this problem.
Output from running the code.

Question 3 (optional, 0 pts)

Swap scripts with a friend and run your friends script with a variety of flags and options. Suggest 1 or more improvements your friend could make to the script. Trade back and implement the suggestion(s).

We’d love for you to do this! Please make sure to put your friends name at the top of your solution to this question, so we can know you collaborated on this problem.

Items to submit

Code used to solve this problem.
Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connect ion, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.