Tuples, Lists and Dictionaries with the NOAA Data

This example is based on question 1-5 from TDM 102 Project 4 Spring 2024. Scope: loops, basic data structures such as tuples, lists, loops, dict

These example(s) depend on the database directory:

/anvil/projects/tdm/data/noaa

The NOAA data set includes weather-related information from multiple weather stations across the United States. It contains daily records for several years in CSV files.

Just as we did in the control flow NOAA dataset example, please remember to use header=None when you read in the data set, and remember to use: names=["id","date","element_code","value","mflag","qflag","sflag","obstime"] to create the column names.

Using Pandas iterrows to Identify Dates with High Precipitation Values

Use Pandas iterrows to loop over the 1880.csv data set. For any row that has information about precipitation (in the element code), if the value column in that row is more than 1200, print the date for that row. (Hint: Ten rows meet this condition, so you should print a total of 10 dates.)

import pandas as pd

df_1880 = pd.read_csv("/anvil/projects/tdm/data/noaa/1880.csv", header=None, names=["id","date","element_code","value","mflag","qflag","sflag","obstime"])

for index, row in df_1880.iterrows():
    if row['element_code'] == 'PRCP':
        if row['value'] >= 1200:
            print(row['date'])

Data Manipulation and Indexing with Pandas

Same question, also using the 1880.csv data set, but now we would like you to use indexing to answer the same question from above.

df_1880[(df_1880.element_code == 'PRCP') & (df_1880.value >= 1200)]['date']

6968      18800108
8868      18800109
74914     18800316
109082    18800420
187316    18800706
218130    18800805
219182    18800806
219199    18800806
223669    18800811
241073    18800827
Name: date, dtype: int64

Average Precipitation Per Year Using a For Loop

Write a for loop that displays the average precipitation per year, for each year from 1800 to 1850. On each line of your output, print the year and the average precipitation for that year.

year_data = []

for year in range(1800, 1851):
    file_path = f"/anvil/projects/tdm/data/noaa/{year}.csv"
    year_data.append(pd.read_csv(file_path, header=None, names=["id","date","element_code","value","mflag","qflag","sflag","obstime"]))

count = 1800
for years in year_data:
    avg_precipitation = years[years['element_code'] == 'PRCP']['value'].mean()

    print(f"Year: {count}, Average Precipitation: {avg_precipitation}")
    count += 1

Year: 1800, Average Precipitation: 16.016438356164382
Year: 1801, Average Precipitation: 19.416438356164385
Year: 1802, Average Precipitation: 11.37945205479452
Year: 1803, Average Precipitation: 19.63013698630137
Year: 1804, Average Precipitation: 16.968903436988544
Year: 1805, Average Precipitation: 13.963013698630137
Year: 1806, Average Precipitation: 13.481361426256077
Year: 1807, Average Precipitation: 15.396825396825397
Year: 1808, Average Precipitation: 16.23633879781421
Year: 1809, Average Precipitation: 14.687671232876712
Year: 1810, Average Precipitation: 14.23489010989011
Year: 1811, Average Precipitation: 12.754794520547945
Year: 1812, Average Precipitation: 15.904371584699453
Year: 1813, Average Precipitation: 22.049397590361444
Year: 1814, Average Precipitation: 21.04501216545012
Year: 1815, Average Precipitation: 24.774074074074075
Year: 1816, Average Precipitation: 20.02589395807645
Year: 1817, Average Precipitation: 20.357409713574096
Year: 1818, Average Precipitation: 18.72570725707257
Year: 1819, Average Precipitation: 23.2260101010101
Year: 1820, Average Precipitation: 17.293478260869566
Year: 1821, Average Precipitation: 19.30807453416149
Year: 1822, Average Precipitation: 15.856035437430787
Year: 1823, Average Precipitation: 18.665745856353592
Year: 1824, Average Precipitation: 16.27336860670194
Year: 1825, Average Precipitation: 16.06637168141593
Year: 1826, Average Precipitation: 18.396419437340153
Year: 1827, Average Precipitation: 17.94869053981828
Year: 1828, Average Precipitation: 15.623736029803087
Year: 1829, Average Precipitation: 18.914679385267622
Year: 1830, Average Precipitation: 19.94102431453699
Year: 1831, Average Precipitation: 20.252832131822863
Year: 1832, Average Precipitation: 18.167397020157757
Year: 1833, Average Precipitation: 22.844265439202147
Year: 1834, Average Precipitation: 17.816445830085737
Year: 1835, Average Precipitation: 19.74567806377257
Year: 1836, Average Precipitation: 22.259059122695486
Year: 1837, Average Precipitation: 20.14407894736842
Year: 1838, Average Precipitation: 21.130470236477304
Year: 1839, Average Precipitation: 23.25336838646073
Year: 1840, Average Precipitation: 19.12202807218562
Year: 1841, Average Precipitation: 19.92908787541713
Year: 1842, Average Precipitation: 20.738959537572253
Year: 1843, Average Precipitation: 24.628443449048152
Year: 1844, Average Precipitation: 21.45823611759216
Year: 1845, Average Precipitation: 24.910962715637172
Year: 1846, Average Precipitation: 24.366737739872068
Year: 1847, Average Precipitation: 18.475292187820383
Year: 1848, Average Precipitation: 21.18880662020906
Year: 1849, Average Precipitation: 20.899487583760347
Year: 1850, Average Precipitation: 22.181586679725758

While Loop for Average Precipitation (1800-1813)

Change your for loop to a while loop, which prints the average precipitation, for each year from 1800 to 1850 BUT stops printing after the first year with average precipitation 22 or higher. (Hint: You will see that, because of the behavior of your while loop, it should print the average precipitation for the years 1800 to 1813.)

year_data = []

for year in range(1800, 1851):
    file_path = f"/anvil/projects/tdm/data/noaa/{year}.csv"
    year_data.append(pd.read_csv(file_path, header=None, names=["id","date","element_code","value","mflag","qflag","sflag","obstime"]))

count = 1800
index = 0

while index < len(year_data):
    years = year_data[index]
    avg_precipitation = years[years['element_code'] == 'PRCP']['value'].mean()
    print(f"Year: {count}, Average Precipitation: {avg_precipitation}")
    if avg_precipitation >= 22:
        break
    count += 1
    index += 1

Year: 1800, Average Precipitation: 16.016438356164382
Year: 1801, Average Precipitation: 19.416438356164385
Year: 1802, Average Precipitation: 11.37945205479452
Year: 1803, Average Precipitation: 19.63013698630137
Year: 1804, Average Precipitation: 16.968903436988544
Year: 1805, Average Precipitation: 13.963013698630137
Year: 1806, Average Precipitation: 13.481361426256077
Year: 1807, Average Precipitation: 15.396825396825397
Year: 1808, Average Precipitation: 16.23633879781421
Year: 1809, Average Precipitation: 14.687671232876712
Year: 1810, Average Precipitation: 14.23489010989011
Year: 1811, Average Precipitation: 12.754794520547945
Year: 1812, Average Precipitation: 15.904371584699453
Year: 1813, Average Precipitation: 22.049397590361444

Largest Average Precipitation ID in 1880.csv

For the 1880.csv data, find the average precipitation for each id. Which id has the largest average precipitation? (Hint: The average precipitation for this id is 610; which id has that largest average precipitation?)

avg_precipitation_per_id = df_1880[df_1880['element_code'] == 'PRCP'].groupby('id')['value'].mean()

max_avg_id = avg_precipitation_per_id.idxmax()
print(f"ID with the largest average precipitation: {max_avg_id}")

ID with the largest average precipitation: USC00483073

Average Precipitation for USC00288878

What is the average precipitation for the id USC00288878?

avg_precipitation_usc00288878 = avg_precipitation_per_id['USC00288878']
print(f"Average precipitation for ID USC00288878: {avg_precipitation_usc00288878}")

Average precipitation for ID USC00288878: 35.08516483516483

Convert Results to Dictionary

Change the results from avg_precipitation_per_id series into a dictionary. (Hint: Depending on how you solved question 3a, if you did it like Dr Ward did it, you probably got a series in question 3a, and you can probably use the to_dict() method to convert the series into a dictionary.)

avg_precipitation_dict = avg_precipitation_per_id.to_dict()
avg_precipitation_dict

{
 'AGE00135039': 11.618705035971223,
 'AGE00147705': 17.115819209039547,
 'AGE00147708': 12.319884726224783,
 'AGE00147709': 12.345821325648416,
 'AGE00147711': 2.8526785714285716,
 'AGE00147712': 8.412844036697248,
 'AGE00147713': 7.915300546448087,
 ...
 'USW00024128': 4.551912568306011,
 'USW00024274': 36.01092896174863,
 'USW00053875': 47.19125683060109,
 'USW00093725': 26.96448087431694,
 'USW00093852': 45.478142076502735,
 'USW00094728': 25.442622950819672
}