STAT 39000: Project 9 — Spring 2021

Motivation: In the previous project you worked through some common logic needed to make a good script. By the end of the project argparse was (hopefully) a welcome package to be able to use. In this project, we are going to continue to learn about argparse and create a CLI for the WHIN Data Portal. In doing so, not only will we get to practice using argparse, but you will also get to learn about using an API to retrieve data. An API (application programming interface) is a common way to retrieve structured data from a company or resource. It is common for large companies like Twitter, Facebook, Google, etc. to make certain data available via API’s, so it is important to get some exposure.

Context: This is the second (of two) projects where we will learn about creating and using Python scripts.

Scope: python

Learning objectives
  • Write a python script that accepts user inputs and returns something useful.

  • Interact with an API to retrieve data.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset

The following questions will involve retrieving data using an API. Instructions and hints will be provided as we go.

Questions

Question 1

WHIN (Wabash Heartland Innovation Network) has deployed hundreds of weather stations across the region so farmers can use the data collected to become more efficient, save time, and increase yields. WHIN has kindly granted access to 20+ public-facing weather stations for educational purposes.

Navigate to data.whin.org/data/current-conditions, and click on the "CREATE ACCOUNT" button in the middle of the screen:

![](./images/p9_01.png)

Click on "I’m a student or educator":

![](./images/p9_02.png)

Enter your information. For "School or Organization" please enter "Purdue University". For "Class or project", please put "The Data Mine Project 9". For the description, please put "We are learning about writing scripts by writing a CLI to fetch data from the WHIN API." Please use your purdue.edu email address. Once complete, click "Next".

Carefully read the LICENSE TERMS before accepting, and confirm your email address if needed. Upon completion, navigate here: data.whin.org/data/current-conditions

Read about the API under "API Usage". An endpoint is the place (in this case the end of a URL (which can be referred to as the URI)) that you can use to access/delete/update/etc. a given resource depending on the HTTP method used. What are the 3 endpoints of this API?

Write and run a script called question01.py that, when run, tries to print the current listing of the weather stations. Instead of printing what you think it should print, it will print something else. What happened?

$HOME/question01.py

You can use the requests library to run the HTTP GET method on the endpoint. For example:

import requests
response = requests.get("https://datamine.purdue.edu/")
print(response.json())

We want to use our regular course environment, therefore, make sure to use the following shebang: #!/class/datamine/apps/python/f2020-s2021/env/bin/python

Items to submit
  • List the 3 endpoints for this API.

  • The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".

  • Output from running your script with the given examples.

Question 2

In question 1, we quickly realize that we are missing a critical step — authentication! Recall that authentication is the process of a system understanding who a person is, authorization is the process of telling whether or not somebody (or something) has permissions to access/modify/delete/etc. a resource. When we make GET requests to data.whin.org/api/weather/stations, or any other endpoint, the server that returns the data we are trying to access has no clue who we are, which explains the result from question 1.

While there are many methods of authentication, WHIN is using Bearer tokens. Navigate here. Take a look at your account info. You should see a large chunk of random numbers and text. This is your bearer token that you can use for authentication. The bearer token is to be sent in the "Authorization" header of the request. For example:

import requests
my_headers = {"Authorization": "Bearer LDFKGHSOIDFRUTRLKJNXDFGT"}
response = requests.get("my_url", headers = my_headers)

Update your script (as a new script called question02.py), and test it out again to see if we get the expected results now. question02.py should only print the first 5 results.

A couple important notes:

  • The bearer token should be taken care of like a password. You do NOT want to share this, ever.

  • There is an inherent risk in saving code like the code shown above. What if you accidentally upload it to GitHub? Then anyone with access could potentially read and use your token.

How can we include the token in our code without typing it in our code? The typical way to handle this is to use environment variables and/or a file containing the information that is specifically NOT shared unless necessary. For example, create a file called .env in your home directory, with the following contents:

MY_BEARER_TOKEN=aslgdkjn304iunglsejkrht09
SOME_OTHER_VARIABLE=some_other_value

In this file, replace the "aslgdkj…​" part with you actual token and save the file. Then make sure only YOU can read and write to this file by running the following in a terminal:

chmod 600 $HOME/.env

Now, we can use a package called dotenv to load the variables in the $HOME/.env file into the environment. We can then use the os package to get the environment variables. For example:

import os
from dotenv import load_dotenv
# This function will load the .env file variables from the same directory as the script into the environment
load_dotenv()
# We can now use os.getenv to get the important information without showing anything.
# Now, all anybody reading the code sees is "os.getenv('MY_BEARER_TOKEN')" even though that is replaced by the actual
# token when the code is run, cool!
my_headers = {"Authorization": f"Bearer {os.getenv('MY_BEARER_TOKEN')}"}

Update question02.py to use dotenv and os.getenv to get the token from the local $HOME/.env file. Test out your script:

$HOME/question02.py
Items to submit
  • The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".

  • Output from running your script with the given example.

Question 3

That’s not so bad! We now know how to retrieve data from the API as well as load up variables from our environment rather than insecurely just pasting them in our code, great!

A query parameter is (more or less) some extra information added at the end of the endpoint. For example, the following url has a query parameter called param and value called value: https://example.com/some_resource?param=value. You could even add more than one query parameter as follows: https://example.com/some_resource?param=value&second_param=second_value — as you can see, now we have another parameter called second_param with a value of second_value. While the query parameters begin with a ?, each subsequent parameter is added using &.

Query parameters can be optional or required. API’s will sometimes utilize query parameters to filter or fine-tune the returned results. Look at the documentation for the /api/weather/station-daily endpoint. Use your newfound knowledge of query parameters to update your script (as a new script called question03.py) to retrieve the data for station with id 150 on 2021-01-05, and print the first 5 results. Test out your script:

$HOME/question03.py
Items to submit
  • The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".

  • Output from running your script with the given example.

Question 4

Excellent, now let’s build our CLI. Call the script whin.py. Use your knowledge of requests, argparse, and API’s to write a CLI that replicates the behavior shown below. For convenience, only print the first 2 results for all output.

  • In general, there will be 3 commands: stations, daily, and cc (for current condition).

  • You will want to create a subparser for each command: stations_parser, current_conditions_parser, and daily_parser.

  • The daily_parser will have 2 position, required arguments: station_id and date.

  • The current_conditions_parser will have 2 optional arguments of type str: --center/-c and --radius/-r.

  • If only one of --center or --radius is present, you should use sys.exit to print a message saying "Need both center AND radius, or neither.".

  • To create a subparser, just do the following:

parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers(help="possible commands", dest="command")
my_subparser = subparsers.add_parser("my_command", help="my help message")
my_subparser.add_argument("--my-option", type=str, help="some option")
args = parser.parse_args()
  • Then, you can access which command was run with args.command (which in this case would only have 1 possible value of my_command), and access any parser or subparsers options with args, for example, args.my_option.

$HOME/whin.py
usage: whin.py [-h] {stations,cc,daily} ...
positional arguments:
  {stations,cc,daily}  possible commands
    stations           list the stations
    cc                 list the most recent data from each weather station
    daily              list data from a given day and station
optional arguments:
  -h, --help           show this help message and exit

A good way to print the help information if no arguments are provided is:

if len(sys.argv) == 1:
    parser.print_help()
    parser.exit()
$HOME/whin.py stations -h
usage: whin.py stations [-h]
optional arguments:
  -h, --help  show this help message and exit
$HOME/whin.py cc -h
usage: whin.py cc [-h] [-c CENTER] [-r RADIUS]
optional arguments:
  -h, --help            show this help message and exit
  -c CENTER, --center CENTER
                        return results near this center coordinate, given as a
                        latitude,longitude pair
  -r RADIUS, --radius RADIUS
                        search distance, in meters, from the center
$HOME/whin.py cc
[{'humidity': 90, 'latitude': 40.93894, 'longitude': -86.47418, 'name': 'WHIN001-PULA001', 'observation_time': '2021-03-16T18:45:00Z', 'pressure': '30.051', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 6, 'soil_moist_2': 11, 'soil_moist_3': 14, 'soil_moist_4': 9, 'soil_temp_1': 42, 'soil_temp_2': 40, 'soil_temp_3': 40, 'soil_temp_4': 41, 'solar_radiation': 203, 'solar_radiation_high': 244, 'station_id': 1, 'temperature': 40, 'temperature_high': 40, 'temperature_low': 40, 'wind_direction_degrees': '337.5', 'wind_gust_direction_degrees': '22.5', 'wind_gust_speed_mph': 6, 'wind_speed_mph': 3}, {'humidity': 88, 'latitude': 40.73083, 'longitude': -86.98467, 'name': 'WHIN003-WHIT001', 'observation_time': '2021-03-16T18:45:00Z', 'pressure': '30.051', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 6, 'soil_moist_2': 5, 'soil_moist_3': 6, 'soil_moist_4': 4, 'soil_temp_1': 40, 'soil_temp_2': 39, 'soil_temp_3': 39, 'soil_temp_4': 40, 'solar_radiation': 156, 'solar_radiation_high': 171, 'station_id': 3, 'temperature': 40, 'temperature_high': 40, 'temperature_low': 39, 'wind_direction_degrees': '337.5', 'wind_gust_direction_degrees': '337.5', 'wind_gust_speed_mph': 8, 'wind_speed_mph': 3}]

Your values may be different because they are current conditions.

$HOME/whin.py cc --radius=10000
Need both center AND radius, or neither.
$HOME/whin.py cc --center=40.4258686,-86.9080654
Need both center AND radius, or neither.
$HOME/whin.py cc --center=40.4258686,-86.9080654 --radius=10000
[{'humidity': 86, 'latitude': 40.42919, 'longitude': -86.84547, 'name': 'WHIN008-TIPP005 Chatham Square', 'observation_time': '2021-03-16T18:45:00Z', 'pressure': '30.012', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 5, 'soil_moist_2': 5, 'soil_moist_3': 5, 'soil_moist_4': 5, 'soil_temp_1': 42, 'soil_temp_2': 41, 'soil_temp_3': 41, 'soil_temp_4': 42, 'solar_radiation': 191, 'solar_radiation_high': 220, 'station_id': 8, 'temperature': 42, 'temperature_high': 42, 'temperature_low': 42, 'wind_direction_degrees': '0', 'wind_gust_direction_degrees': '22.5', 'wind_gust_speed_mph': 9, 'wind_speed_mph': 3}, {'humidity': 86, 'latitude': 40.38494, 'longitude': -86.84577, 'name': 'WHIN027-TIPP003 EXT', 'observation_time': '2021-03-16T18:45:00Z', 'pressure': '29.515', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 5, 'soil_moist_2': 4, 'soil_moist_3': 4, 'soil_moist_4': 5, 'soil_temp_1': 43, 'soil_temp_2': 42, 'soil_temp_3': 42, 'soil_temp_4': 42, 'solar_radiation': 221, 'solar_radiation_high': 244, 'station_id': 27, 'temperature': 43, 'temperature_high': 43, 'temperature_low': 43, 'wind_direction_degrees': '337.5', 'wind_gust_direction_degrees': '337.5', 'wind_gust_speed_mph': 6, 'wind_speed_mph': 3}]
$HOME/whin.py daily
usage: whin.py daily [-h] station_id date
whin.py daily: error: too few arguments
$HOME/whin.py daily 150 2021-01-05
[{'humidity': 96, 'latitude': 41.00467, 'longitude': -86.68428, 'name': 'WHIN058-PULA007', 'observation_time': '2021-01-05T05:00:00Z', 'pressure': '29.213', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 5, 'soil_moist_2': 6, 'soil_moist_3': 7, 'soil_moist_4': 5, 'soil_temp_1': 33, 'soil_temp_2': 34, 'soil_temp_3': 35, 'soil_temp_4': 35, 'solar_radiation': 0, 'solar_radiation_high': 0, 'station_id': 150, 'temperature': 31, 'temperature_high': 31, 'temperature_low': 31, 'wind_direction_degrees': '270', 'wind_gust_direction_degrees': '292.5', 'wind_gust_speed_mph': 13, 'wind_speed_mph': 8}, {'humidity': 96, 'latitude': 41.00467, 'longitude': -86.68428, 'name': 'WHIN058-PULA007', 'observation_time': '2021-01-05T05:15:00Z', 'pressure': '29.207', 'rain': '1', 'rain_inches_last_hour': '0', 'soil_moist_1': 5, 'soil_moist_2': 6, 'soil_moist_3': 7, 'soil_moist_4': 5, 'soil_temp_1': 33, 'soil_temp_2': 34, 'soil_temp_3': 35, 'soil_temp_4': 35, 'solar_radiation': 0, 'solar_radiation_high': 0, 'station_id': 150, 'temperature': 31, 'temperature_high': 31, 'temperature_low': 31, 'wind_direction_degrees': '270', 'wind_gust_direction_degrees': '292.5', 'wind_gust_speed_mph': 14, 'wind_speed_mph': 9}]
Items to submit
  • The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".

  • Output from running your script with the given examples.

Question 5

There are a multitude of improvements and/or features that we could add to whin.py. Customize your script (as a new script called question05.py), to either do something new, or fix a scenario that wasn’t covered in question 4. Be sure to include 1-2 sentences that explains exactly what your modification does. Demonstrate the feature by running it in a bash code chunk.

Items to submit
  • The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".

  • Output from running your script with the given examples.