TDM 10200: Project 8 — 2024

Motivation: We will continue to introduce functions and visualization

Context: Write functions with visualizations

Scope: python, functions, pandas, matplotlib, Parquet columnar storage file format

Reading and Resources



We added eleven new videos to help you with Project 8.

You need to use 3 cores for your Jupyter Lab session for Project 8 this week.

You can use pd.read_parquet to read in a parquet file (very similarly to how you use pd.read_csv to read in a csv file!)

You can use pd.set_option('display.max_columns', None) if you want to see all of the columns in a very wide data frame.


Question 1 (2 points)

Read the file into a DataFrame called myDF.

  1. Convert the observation_time column to into a datetime type.

  2. Create 3 new columns for the year, month and day, based on the column observation_time.

  3. For a given station_id, calculate the average month-and-year-pair temperatures (from the column temperature) for that station_id. Try this for a few different station_id values.

  4. Now write a function called get_avg_temp that takes one station_id as input and returns the average month-and-year-pair temperatures (associated with that specific station_id). Make sure that the results of your function match with your work from question 1c.

Question 2 (2 points)

For this function, be sure to import matplotlib.pyplot.

We will use the function from question 1d to make some line plots.

  1. For a given station_id, create a line plot, with one line for each year. Try this for a few different station_id values.

  2. Now that you are sure your analysis from 2a works well, wrap your work from question 2a into a function that takes a station_id as input, and creates a line plot, with one line for each year (for the average month-and-year-pair temperatures from that station_id).

Question 3 (2 points)

  1. Revisit the function from question 1d, to find the maximum temperature (instead of the average temperature) in each month-and-year-pair, for a given station. As before, you should test this for several examples before you build the function, and then make sure your function matches your examples.

  2. Revisit the function from question 2b, to make a function that takes one station_id as input and it creates a bar plot (instead of a line plot), depicting the maximum temperature in each month-and-year-pair (instead of the average temperature).

Your work from question 3b can utilize the function you build in question 3a.

Question 4 (2 points)

  1. For a given station_id, create a box plot that shows the month-by-month wind speeds in 2020 for that specified station_id. Try this for a few different station_id values.

  2. Write a function that takes a year (not necessarily 2020) and a station_id as inputs, and the function creates a box plot about the month-by-month wind speeds in that specific year (not necessarily 2020), at the specified station_id.

Question 5 (2 points)

  1. Explore the dateset and find something interesting, like (for instance) something about the wind speed, pressure, soil temperature, etc., and do some analysis.

  2. Make a visualization that shows one or more plots about your analysis.

  3. Wrap the work 5a and 5b into a function that can be used to create the visualizations in a systematic way, and test the function with the same inputs used in 5a and 5b.

Project 08 Assignment Checklist

  • Jupyter Lab notebook with your code, comments and output for the assignment

    • firstname-lastname-project08.ipynb.

  • Python file with code and comments for the assignment


  • Submit files through Gradescope

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.