TDM 10200: Project 5 — 2024

Motivation: Once we have some data analysis working in Python, we often want to wrap it into a function. Dr Ward usually tests anything that he wrote (usually 5 times), to make sure it works, before wrapping it into a function. Once we are sure our analysis works, if we wrap it into a function, it can usually be easier to use.

Context: Functions also help us to put our work into bite-size pieces that are easier to understand. The basic idea is similar to functions from R or from other languages and tools.

Scope: functions



Reading and Resources

  • Make sure to read about, and use the template found here, and the important information about projects submissions here.

  • We strongly encourage you to (please) look at our many examples of Python Functions

  • Dr Ward added 4 additional videos that are specific to Project 5


Question 1 (2 points)

  1. Write a function called avg_aggreg_temp that takes 5 parameters: the file_location as a string, the column_title_list as a list of column titles, the start_date as an integer, the end_date as an integer, and the temperature element_code as a string, with default value "TAVG". Your function should output the average value for rows with that element_code. For instance, in the default case (where element_code is "TAVG"), your function should output the average value which is the average of the average temperatures (as a decimal number).

  2. Run the function for on the data set 2018.csv, using start_date 20180101 and end_date 20180115, and with "TAVG" as the element_code.

  • Remember that the NOAA data does not include column titles, so use column_title_list=["id","date","element_code","value","mflag","qflag","sflag","obstime"]

  • The data set is approximately 2.2 GB. It is OK to use only 1 core for your Jupyter Lab session, if you set chunksize = 10000 as you read in data.

  • Input the start_date and the end_date as integers in the form yyyymmdd

  • You will need to include a docstring that clearly explains how the function is defined.

Question 2 (2 points)

  1. Create a function that takes a list of years (or, if you prefer, a list of file locations), as a list of column names, and an element_code as input, and returns a dictionary with one entry per year. In the dictionary, for each year, it should have the year as the key and the average value of the specified element_code as the value for that year.

  2. Test your function for the element_code "TAVG" and for the range of four years 1880 to 1883 (inclusive), i.e., range(1880,1884).

  • You can EITHER use a list of years or a list of file locations for the input, but please explain all of your work, and be sure to provide documentation about how a person uses your function. Documentation is really important!

  • Include column titles ["id","date","element_code","value","mflag","qflag","sflag","obstime"]

Question 3 (2 points)

  1. Modify the function that you created in Question 2, to include an extra parameter for the month. This function should have the same behavior as in Question 2, but for each year, the function should only use the data from that month of the year.

  2. Test your function for the element_code "TAVG" and for the range of four years 1880 to 1883 (inclusive), i.e., range(1880,1884), and for the month August in each year.

Question 4 (2 points)

  1. Create a function that takes a list of years as input, and identifies the year that has the most qflags of the type that the user specified.

  2. Run the function for years in the range 1880 to 1883, and test it with some various qflag values, such as D, G, I, K, L, N, O, S, X.

Question 5 (2 points)

  1. Explore the dataset files from the noaa directory, and create a function of your own design, about something that interests you. Make sure to include a docstring that explains the function’s definition.

  2. Run your function, and explain the inputs and outputs, so that a user can understand how it works.

Project 05 Assignment Checklist

  • Jupyter Lab notebook with your code, comments and output for the assignment

    • firstname-lastname-project05.ipynb.

  • Python file with code and comments for the assignment


  • Submit files through Gradescope

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.