Tuples, Lists and Dictionaries with the NOAA Data
This example is based on question 1-5 from TDM 102 Project 4 Spring 2024. Scope: loops, basic data structures such as tuples, lists, loops, dict
These example(s) depend on the database directory:
-
/anvil/projects/tdm/data/noaa
Just as we did in the control flow NOAA dataset example, please remember to use header=None
when you read in the data set, and remember to use: names=["id","date","element_code","value","mflag","qflag","sflag","obstime"]
to create the column names.
Using Pandas iterrows to Identify Dates with High Precipitation Values
Use Pandas iterrows to loop over the 1880.csv data set. For any row that has information about precipitation (in the element code), if the value column in that row is more than 1200, print the date for that row. (Hint: Ten rows meet this condition, so you should print a total of 10 dates.)
import pandas as pd
df_1880 = pd.read_csv("/anvil/projects/tdm/data/noaa/1880.csv", header=None, names=["id","date","element_code","value","mflag","qflag","sflag","obstime"])
for index, row in df_1880.iterrows():
if row['element_code'] == 'PRCP':
if row['value'] >= 1200:
print(row['date'])
18800108 18800109 18800316 18800420 18800706 18800805 18800806 18800806 18800811 18800827
Data Manipulation and Indexing with Pandas
Same question, also using the 1880.csv data set, but now we would like you to use indexing to answer the same question from above.
df_1880[(df_1880.element_code == 'PRCP') & (df_1880.value >= 1200)]['date']
6968 18800108 8868 18800109 74914 18800316 109082 18800420 187316 18800706 218130 18800805 219182 18800806 219199 18800806 223669 18800811 241073 18800827 Name: date, dtype: int64
Average Precipitation Per Year Using a For Loop
Write a for loop
that displays the average precipitation per year, for each year from 1800 to 1850. On each line of your output, print the year and the average precipitation for that year.
year_data = []
for year in range(1800, 1851):
file_path = f"/anvil/projects/tdm/data/noaa/{year}.csv"
year_data.append(pd.read_csv(file_path, header=None, names=["id","date","element_code","value","mflag","qflag","sflag","obstime"]))
count = 1800
for years in year_data:
avg_precipitation = years[years['element_code'] == 'PRCP']['value'].mean()
print(f"Year: {count}, Average Precipitation: {avg_precipitation}")
count += 1
Year: 1800, Average Precipitation: 16.016438356164382 Year: 1801, Average Precipitation: 19.416438356164385 Year: 1802, Average Precipitation: 11.37945205479452 Year: 1803, Average Precipitation: 19.63013698630137 Year: 1804, Average Precipitation: 16.968903436988544 Year: 1805, Average Precipitation: 13.963013698630137 Year: 1806, Average Precipitation: 13.481361426256077 Year: 1807, Average Precipitation: 15.396825396825397 Year: 1808, Average Precipitation: 16.23633879781421 Year: 1809, Average Precipitation: 14.687671232876712 Year: 1810, Average Precipitation: 14.23489010989011 Year: 1811, Average Precipitation: 12.754794520547945 Year: 1812, Average Precipitation: 15.904371584699453 Year: 1813, Average Precipitation: 22.049397590361444 Year: 1814, Average Precipitation: 21.04501216545012 Year: 1815, Average Precipitation: 24.774074074074075 Year: 1816, Average Precipitation: 20.02589395807645 Year: 1817, Average Precipitation: 20.357409713574096 Year: 1818, Average Precipitation: 18.72570725707257 Year: 1819, Average Precipitation: 23.2260101010101 Year: 1820, Average Precipitation: 17.293478260869566 Year: 1821, Average Precipitation: 19.30807453416149 Year: 1822, Average Precipitation: 15.856035437430787 Year: 1823, Average Precipitation: 18.665745856353592 Year: 1824, Average Precipitation: 16.27336860670194 Year: 1825, Average Precipitation: 16.06637168141593 Year: 1826, Average Precipitation: 18.396419437340153 Year: 1827, Average Precipitation: 17.94869053981828 Year: 1828, Average Precipitation: 15.623736029803087 Year: 1829, Average Precipitation: 18.914679385267622 Year: 1830, Average Precipitation: 19.94102431453699 Year: 1831, Average Precipitation: 20.252832131822863 Year: 1832, Average Precipitation: 18.167397020157757 Year: 1833, Average Precipitation: 22.844265439202147 Year: 1834, Average Precipitation: 17.816445830085737 Year: 1835, Average Precipitation: 19.74567806377257 Year: 1836, Average Precipitation: 22.259059122695486 Year: 1837, Average Precipitation: 20.14407894736842 Year: 1838, Average Precipitation: 21.130470236477304 Year: 1839, Average Precipitation: 23.25336838646073 Year: 1840, Average Precipitation: 19.12202807218562 Year: 1841, Average Precipitation: 19.92908787541713 Year: 1842, Average Precipitation: 20.738959537572253 Year: 1843, Average Precipitation: 24.628443449048152 Year: 1844, Average Precipitation: 21.45823611759216 Year: 1845, Average Precipitation: 24.910962715637172 Year: 1846, Average Precipitation: 24.366737739872068 Year: 1847, Average Precipitation: 18.475292187820383 Year: 1848, Average Precipitation: 21.18880662020906 Year: 1849, Average Precipitation: 20.899487583760347 Year: 1850, Average Precipitation: 22.181586679725758
While Loop for Average Precipitation (1800-1813)
Change your for loop to a while loop, which prints the average precipitation, for each year from 1800 to 1850 BUT stops printing after the first year with average precipitation 22 or higher. (Hint: You will see that, because of the behavior of your while loop, it should print the average precipitation for the years 1800 to 1813.)
year_data = []
for year in range(1800, 1851):
file_path = f"/anvil/projects/tdm/data/noaa/{year}.csv"
year_data.append(pd.read_csv(file_path, header=None, names=["id","date","element_code","value","mflag","qflag","sflag","obstime"]))
count = 1800
index = 0
while index < len(year_data):
years = year_data[index]
avg_precipitation = years[years['element_code'] == 'PRCP']['value'].mean()
print(f"Year: {count}, Average Precipitation: {avg_precipitation}")
if avg_precipitation >= 22:
break
count += 1
index += 1
Year: 1800, Average Precipitation: 16.016438356164382 Year: 1801, Average Precipitation: 19.416438356164385 Year: 1802, Average Precipitation: 11.37945205479452 Year: 1803, Average Precipitation: 19.63013698630137 Year: 1804, Average Precipitation: 16.968903436988544 Year: 1805, Average Precipitation: 13.963013698630137 Year: 1806, Average Precipitation: 13.481361426256077 Year: 1807, Average Precipitation: 15.396825396825397 Year: 1808, Average Precipitation: 16.23633879781421 Year: 1809, Average Precipitation: 14.687671232876712 Year: 1810, Average Precipitation: 14.23489010989011 Year: 1811, Average Precipitation: 12.754794520547945 Year: 1812, Average Precipitation: 15.904371584699453 Year: 1813, Average Precipitation: 22.049397590361444
Largest Average Precipitation ID in 1880.csv
For the 1880.csv data, find the average precipitation for each id. Which id has the largest average precipitation? (Hint: The average precipitation for this id is 610; which id has that largest average precipitation?)
avg_precipitation_per_id = df_1880[df_1880['element_code'] == 'PRCP'].groupby('id')['value'].mean()
max_avg_id = avg_precipitation_per_id.idxmax()
print(f"ID with the largest average precipitation: {max_avg_id}")
ID with the largest average precipitation: USC00483073
Average Precipitation for USC00288878
What is the average precipitation for the id USC00288878?
avg_precipitation_usc00288878 = avg_precipitation_per_id['USC00288878']
print(f"Average precipitation for ID USC00288878: {avg_precipitation_usc00288878}")
Average precipitation for ID USC00288878: 35.08516483516483
Convert Results to Dictionary
Change the results from avg_precipitation_per_id
series into a dictionary. (Hint: Depending on how you solved question 3a, if you did it like Dr Ward did it, you probably got a series in question 3a, and you can probably use the to_dict() method to convert the series into a dictionary.)
avg_precipitation_dict = avg_precipitation_per_id.to_dict()
avg_precipitation_dict
{ 'AGE00135039': 11.618705035971223, 'AGE00147705': 17.115819209039547, 'AGE00147708': 12.319884726224783, 'AGE00147709': 12.345821325648416, 'AGE00147711': 2.8526785714285716, 'AGE00147712': 8.412844036697248, 'AGE00147713': 7.915300546448087, ... 'USW00024128': 4.551912568306011, 'USW00024274': 36.01092896174863, 'USW00053875': 47.19125683060109, 'USW00093725': 26.96448087431694, 'USW00093852': 45.478142076502735, 'USW00094728': 25.442622950819672 }