TDM 10200: Python Project 11 — Spring 2025
Motivation: We continue to learn how to create functions and apply the functions to many data sets in an efficient way. We can extract information in a straightforward way.
Context: We learn how to apply functions using map
and using list comprehensions.
Scope: Applying functions to data.
Dataset(s)
This project will use the following datasets:
-
/anvil/projects/tdm/data/flights/subset/*
(flights data) -
/anvil/projects/tdm/data/election/itcont/*
(election data) -
/anvil/projects/tdm/data/icecream/*
(ice cream data)
In this project, we walk students through these powerful techniques.
It should be enough to use 2 cores in your Jupyter Lab session for this project. Some of the questions in this project take a few minutes to run. Please remember to make comments (in full English sentences) about your understanding of each of the questions and solutions. This is especially important in this project, where we walk you through the solutions, but we still want you to write some explanations of your own! |
Questions
Question 1 (2 pts)
For this question, only use the data corresponding to flights with Origin
at the Indianapolis IND
airport.
Hint: If you only want to import the 3 columns of the data that you need, recall that column 1 (the second column) is the month; column 15 (the 16th column) is the DepDelay; and column 16 (the 17th column) is the Origin. Python starts numbering from 0.
-
Write a function called
monthlydepdelays
that takes a year as the input and usesgroupby
to return a Series of length 12 with the averageDepDelay
for flights starting atIND
in each of the 12 months of that year. -
Test your function individually, one at a time, on the years 1990, 1998, and 2005. For instance, if you run
monthlydepdelays(1990)
, the output should be something like:
1 7.282772
2 9.497027
3 6.924841
4 4.949858
5 5.471487
6 6.010835
7 4.307377
8 5.639782
9 4.455586
10 4.473725
11 3.408304
12 9.764105
-
Write a function called
monthlydepdelays
that takes a year as the input and returns a Series of length 12 with the averageDepDelay
for flights starting atIND
in each of the 12 months of that year. -
Show the output of
monthlydepdelays(1990)
andmonthlydepdelays(1998)
andmonthlydepdelays(2005)
.
Question 2 (2 pts)
Consider this page, about using matplotlib
to display several plots all at once:
We want to display 6 plots, which appear in 3 rows and 2 columns, namely: plot the results of monthlydepdelays
for the years 1988 through 1993.
It could look as simple as this:
import matplotlib.pyplot as plt
results1988 = monthlydepdelays(1988)
results1989 = monthlydepdelays(1989)
results1990 = monthlydepdelays(1990)
results1991 = monthlydepdelays(1991)
results1992 = monthlydepdelays(1992)
results1993 = monthlydepdelays(1993)
fig, axs = plt.subplots(2, 3)
axs[0, 0].bar(results1988.index, results1988)
axs[0, 0].set_title("1988 Delays")
axs[0, 1].bar(results1989.index, results1989)
axs[0, 1].set_title("1989 Delays")
axs[0, 2].bar(results1990.index, results1990)
axs[0, 2].set_title("1990 Delays")
axs[1, 0].bar(results1991.index, results1991)
axs[1, 0].set_title("1991 Delays")
axs[1, 1].bar(results1992.index, results1992)
axs[1, 1].set_title("1992 Delays")
axs[1, 2].bar(results1993.index, results1993)
axs[1, 2].set_title("1993 Delays")
or if you want to use a loop, you could write:
import matplotlib.pyplot as plt
allmyyears = [myyear for myyear in range(1988,1994)]
myresults = [monthlydepdelays(myyear) for myyear in allmyyears]
nrow = 2
ncol = 3
fig, axs = plt.subplots(nrow, ncol)
for i, ax in enumerate(fig.axes):
ax.bar(myresults[i].index, myresults[i])
ax.set_title(str(allmyyears[i]) + " Delays")
-
Make a 3 by 2 frame of 6 plots, corresponding to the results of
monthlydepdelays
in the years 1988 through 1993.
Question 3 (2 pts)
For this question, only use the data corresponding to donations from the state of Indiana.
-
Write a function called
myindycities
that takes a year as the input and usesgroupby
to make a Series of length 10, containing the top 10 cities in Indiana according to the sum of the amount of donations (in dollars) given in each city, during that year. -
Test your function individually, one at a time, on the years 1980, 1986, and 1992. For instance, if you run
myindycities(1984)
, the output should be something like:
FT WAYNE 44665
TERRE HAUTE 52650
CARMEL 53200
EVANSVILLE 65250
SOUTH BEND 68387
INDPLS 76520
FORT WAYNE 80882
ELKHART 93171
MUNCIE 104260
INDIANAPOLIS 511935
-
Write a function called
myindycities
that takes a year as the input and usesgroupby
to make a Series of length 10, containing the top 10 cities in Indiana according to the sum of the amount of donations (in dollars) given in each city, during that year. -
Show the output of
myindycities(1980)
andmyindycities(1986)
andmyindycities(1992)
.
Question 4 (2 pts)
-
Run the function
myindycities
on each of the even-numbered election years 1984 to 1994 as follows:
allmyyears = [myyear for myyear in range(1984,1995,2)]
myresults = [myindycities(myyear) for myyear in allmyyears]
-
Now display 6 plots, which appear in 3 rows and 2 columns, namely: plot the results of
myindycities
for the even-numbered years 1984 through 1994.
If you want to use a loop, you could write:
import matplotlib.pyplot as plt
nrow = 2
ncol = 3
fig, axs = plt.subplots(nrow, ncol)
for i, ax in enumerate(fig.axes):
ax.barh(myresults[i].index, myresults[i])
ax.set_title(str(allmyyears[i]) + " Indy Cities")
Do not worry about the fact that the plots are overlapping. |
-
Show the results of
myindycities
for each even-numbered year from 1984 to 1994. -
Make a horizontal bar plot for each of the 6 years in part a.
Question 5 (2 pts)
-
Find the average number of stars in each of these four files:
/anvil/projects/tdm/data/icecream/bj/reviews.csv
/anvil/projects/tdm/data/icecream/breyers/reviews.csv
/anvil/projects/tdm/data/icecream/hd/reviews.csv
/anvil/projects/tdm/data/icecream/talenti/reviews.csv
-
Write a function
myavgstars
that takes a company name (e.g., either "bj" or "breyers" or "hd" or "talenti") as input, and returns the average number of stars for that company. -
Define a vector of length 4, with all 4 of these company names:
mycompanies = ["bj", "breyers", "hd", "talenti"]
and now use a list comprehension to run the function from part b that re-computes the values from part a, all at once, like this:
[myavgstars(x) for x in mycompanies]
-
Print the average number of stars for each of the 4 ice cream companies.
-
Write a function
myavgstars
that takes a company name (e.g., either "bj" or "breyers" or "hd" or "talenti") as input, and returns the average number of stars for that company. -
Use a list comprehension to run the function from part b on the list
mycompanies
, which should give the same values as in part a.
Submitting your Work
Please make sure that you added comments for each question, which explain your thinking about your method of solving each question. Please also make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template.
If you have any questions or issues regarding this project, please feel free to ask in seminar, over Piazza, or during office hours.
Prior to submitting your work, you need to put your work into the project template, and re-run all of the code in your Jupyter notebook and make sure that the results of running that code is visible in your template. Please check the detailed instructions on how to ensure that your submission is formatted correctly. To download your completed project, you can right-click on the file in the file explorer and click 'download'.
Once you upload your submission to Gradescope, make sure that everything appears as you would expect to ensure that you don’t lose any points.
-
firstname_lastname_project11.ipynb
It is necessary to document your work, with comments about each solution. All of your work needs to be your own work, with citations to any source that you used. Please make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template. You must double check your Please take the time to double check your work. See here for instructions on how to double check this. You will not receive full credit if your |