Data Trasnformations with the WHIN Dataset

This example is from TDM 102 Project 9 Spring 2024.

These example(s) depend on the database:

  • /anvil/projects/tdm/data/whin/weather.parquet

Learn more about the dataset here.

2a. Find out how many null records exist in myDF, within each individual column. (Your answer should specify, for each column, how many null records are in that column.)

myDF.isnull().sum()
Column Name Null Count

station_id

0

latitude

0

longitude

0

name

0

observation_time

0

temperature

4176

temperature_high

4123

temperature_low

4123

humidity

4162

solar_radiation

48104

solar_radiation_high

12828

rain

0

rain_inches_last_hour

0

wind_speed_mph

4032

wind_direction_degrees

95545

wind_gust_speed_mph

0

wind_gust_direction_degrees

94216

pressure

0

soil_temp_1

114043

soil_temp_2

114041

soil_temp_3

114037

soil_temp_4

114036

soil_moist_1

114124

soil_moist_2

114170

soil_moist_3

114174

soil_moist_4

114150

2b. Now count the total number of null values in the entire data frame myDF. (In other words, add up the values from all of the counts in part 2a.)

myDF.isnull().sum().sum()
1184084

2c. Drop rows with any null values in myDF. Save the resulting cleaned data set into a new DataFrame called myDF_cleaned.

myDF_cleaned = myDF.dropna()

2d. Just to make sure that you did this properly, check myDF_cleaned carefully: Are there any null values remaining in myDF_cleaned? (There should not be.) How many rows and columns are in myDF_cleaned?

myDF_cleaned.isnull().sum().sum()
0