Basics of Exploratory Data Analysis with the WHIN Dataset
This example is from TDM 102 Project 9 Spring 2024.
These example(s) depend on the database:
-
/anvil/projects/tdm/data/whin/weather.parquet
Learn more about the dataset here.
1aa. Use the method value_counts()
to get the number of records for each station.
import pandas as pd
import time
s_t = time.time()
myDF = pd.read_csv('/anvil/projects/tdm/data/whin/weather.csv')
print(time.time()-s_t)
1.546401023864746
myDF['station_id'].value_counts()
Station ID | Count |
---|---|
1 |
71631 |
20 |
56917 |
147 |
45600 |
143 |
45593 |
146 |
45579 |
145 |
45509 |
144 |
45495 |
142 |
45395 |
149 |
44060 |
159 |
43563 |
153 |
43559 |
157 |
43534 |
155 |
42993 |
160 |
42810 |
156 |
42807 |
151 |
42806 |
164 |
38791 |
163 |
37908 |
166 |
30242 |
167 |
27463 |
168 |
18581 |
169 |
14905 |
172 |
14812 |
179 |
14772 |
176 |
14402 |
173 |
13648 |
175 |
13314 |
171 |
13311 |
import pandas as pd
import time
s_t = time.time()
myDF = pd.read_csv('/anvil/projects/tdm/data/whin/weather.csv')
print(time.time()-s_t)
1ab. Use the method groupby()
to get the number of records for each station
myDF.groupby('station_id').size()
Station ID | Count |
---|---|
1 |
71631 |
20 |
56917 |
147 |
45600 |
143 |
45593 |
146 |
45579 |
145 |
45509 |
144 |
45495 |
142 |
45395 |
149 |
44060 |
159 |
43563 |
153 |
43559 |
157 |
43534 |
155 |
42993 |
160 |
42810 |
156 |
42807 |
151 |
42806 |
164 |
38791 |
163 |
37908 |
166 |
30242 |
167 |
27463 |
168 |
18581 |
169 |
14905 |
172 |
14812 |
179 |
14772 |
176 |
14402 |
173 |
13648 |
175 |
13314 |
171 |
13311 |