Data Trasnformations with the WHIN Dataset
This example is from TDM 102 Project 9 Spring 2024.
These example(s) depend on the database:
-
/anvil/projects/tdm/data/whin/weather.parquet
Learn more about the dataset here.
2a. Find out how many null records exist in myDF
, within each individual column. (Your answer should specify, for each column, how many null records are in that column.)
myDF.isnull().sum()
Column Name | Null Count |
---|---|
station_id |
0 |
latitude |
0 |
longitude |
0 |
name |
0 |
observation_time |
0 |
temperature |
4176 |
temperature_high |
4123 |
temperature_low |
4123 |
humidity |
4162 |
solar_radiation |
48104 |
solar_radiation_high |
12828 |
rain |
0 |
rain_inches_last_hour |
0 |
wind_speed_mph |
4032 |
wind_direction_degrees |
95545 |
wind_gust_speed_mph |
0 |
wind_gust_direction_degrees |
94216 |
pressure |
0 |
soil_temp_1 |
114043 |
soil_temp_2 |
114041 |
soil_temp_3 |
114037 |
soil_temp_4 |
114036 |
soil_moist_1 |
114124 |
soil_moist_2 |
114170 |
soil_moist_3 |
114174 |
soil_moist_4 |
114150 |
2b. Now count the total number of null values in the entire data frame myDF
. (In other words, add up the values from all of the counts in part 2a.)
myDF.isnull().sum().sum()
1184084
2c. Drop rows with any null values in myDF
. Save the resulting cleaned data set into a new DataFrame called myDF_cleaned
.
myDF_cleaned = myDF.dropna()
2d. Just to make sure that you did this properly, check myDF_cleaned
carefully: Are there any null values remaining in myDF_cleaned
? (There should not be.) How many rows and columns are in myDF_cleaned
?
myDF_cleaned.isnull().sum().sum()
0