Basics of Exploratory Data Analysis with the WHIN Dataset

This example is from TDM 102 Project 9 Spring 2024.

These example(s) depend on the database:

  • /anvil/projects/tdm/data/whin/weather.parquet

Learn more about the dataset here.

1aa. Use the method value_counts() to get the number of records for each station.

import pandas as pd
import time
s_t = time.time()
myDF = pd.read_csv('/anvil/projects/tdm/data/whin/weather.csv')
print(time.time()-s_t)
1.546401023864746
myDF['station_id'].value_counts()
Station ID Count

1

71631

20

56917

147

45600

143

45593

146

45579

145

45509

144

45495

142

45395

149

44060

159

43563

153

43559

157

43534

155

42993

160

42810

156

42807

151

42806

164

38791

163

37908

166

30242

167

27463

168

18581

169

14905

172

14812

179

14772

176

14402

173

13648

175

13314

171

13311

import pandas as pd
import time
s_t = time.time()
myDF = pd.read_csv('/anvil/projects/tdm/data/whin/weather.csv')
print(time.time()-s_t)

1ab. Use the method groupby() to get the number of records for each station

myDF.groupby('station_id').size()
Station ID Count

1

71631

20

56917

147

45600

143

45593

146

45579

145

45509

144

45495

142

45395

149

44060

159

43563

153

43559

157

43534

155

42993

160

42810

156

42807

151

42806

164

38791

163

37908

166

30242

167

27463

168

18581

169

14905

172

14812

179

14772

176

14402

173

13648

175

13314

171

13311