TDM 10200: Project 6 — 2023

Motivation: Once we have some data analysis working in Python, we often want to wrap it into a function. Dr Ward usually tests anything that he wrote (usually 5 times), to make sure it works, before wrapping it into a function. Once we are sure our analysis works, if we wrap it into a function, it can usually be easier to use.

Context: Functions also help us to put our work into bite-size pieces that are easier to understand. The basic idea is similar to functions from R or from other languages and tools.

Scope: functions

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

/anvil/projects/tdm/data/craigslist/vehicles.csv

/anvil/projects/tdm/data/flights/subset/

We have written several examples to get you started with functions here. All 6 of the videos from Project 5 are given at the top of this page.

UPDATE: WE ADDED 3 MORE VIDEOS FOR PROJECT 6, which are given under the heading Project 6 videos in this section.

If it helps, you also have a longer article available here. It is a very detailed article going through many things that you can do with functions in Python. In particular, the section on Argument Passing might be helpful.

Questions

ONE

Read in the dataset from Project 4

/anvil/projects/tdm/data/craigslist/vehicles.csv

and name it cars.

  1. Modify your mycarcount function from Question 1 of Project 5, so that it takes a list of years as a parameter, and prints the number of vehicles from each year in your list.

  2. Now test your mycarcount function on the list of years [2011, 1989, 1997]. The output from mycarcount(cars, [2011, 1989, 1997]) or from mycarcount([2011, 1989, 1997]) should be the number of vehicles from each of the years 2011, 1989, and 1997, respectively.

As in Project 5, you can either have cars and the list of years as two arguments, or you can just have (only) the list of years as an argument and import the data from cars inside the function itself using the chunksize = 10000 and the iterrows like in the videos. Either way is OK.

Items to submit
  • Code used to answer the question.

  • Result of code.

TWO

  1. Write a loop that prints the number of vehicles from chicago as the region in each of the years 2016, 2017, 2018. (I.e., you should have 3 lines of output.)

  2. Now write a double-loop that prints the number of vehicles from each region in the list [chicago, indianapolis, cincinnati] in each of the years 2016, 2017, 2018. (I.e., you should have 9 lines of output.)

Items to submit
  • Code used to answer the question.

  • Result of code.

THREE

  1. Write a function with two arguments: a list of regions, and a list of years. The function should print a listing that shows the number of vehicles from each of those regions in each of those years. (I.e., it will print one line of output for each region during each year.)

  2. Use your new function to re-create the answer to question 2b.

  3. Test your function on some lists of regions and lists of years of your choice.

Items to submit
  • Code used to answer the question.

  • Result of code.

FOUR

Use the function from Question 3a in Project 5.

  1. Write a loop that prints the number of flights that depart from IND as the Origin airport in each of the years 1988, 1989, 1990. (I.e., you should have 3 lines of output.)

  2. Now write a double-loop that prints the number of flights that depart from each of the airports IND, ORD, CVG as the Origin airport in each of the years 1988, 1989, 1990. (I.e., you should have 9 lines of output.)

For this question, you should not read the full data frame all at once, but instead, you should just a few lines at a time, using the chunksize = 10000 and the iterrows like in the videos.

Items to submit
  • Code used to answer the question.

  • Result of code.

FIVE

Extend your function from Question 3 as follows:

  1. Write a function with two arguments: a list of Origin airports, and a list of years. The function should print a listing that shows the number of flights departing from each of those Origin airports in each of those years. (I.e., it will have one line of output for each Origin airport during each year.)

  2. Use your new function to re-create the answer to question 4b.

  3. Test your function on some lists of Origin airports and lists of years of your choice.

Again, for this question, you should not read the full data frame all at once, but instead, you should just a few lines at a time, using the chunksize = 10000 and the iterrows like in the videos.

Items to submit
  • Code used to answer the question.

  • Result of code.

TA applications for The Data Mine are currently being accepted. Please visit us here to apply!

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.