Filtering and Selecting with the Flight Dataset
This example is from TDM 102 Project 10 Spring 2024.
These example(s) depend on the database:
-
/anvil/projects/tdm/data/flights/2014.csv
Learn more about the dataset here.
5a. Create 3 numpy arrays for the DepDelay, ArrDelay, and Distance data
import pandas as pd
import numpy as np
dep_delay = df['DepDelay'].to_numpy()
arr_delay = df['ArrDelay'].to_numpy()
distance = df['Distance'].to_numpy()
5b. Filter the numpy array with the Distance stored in it, so that you have only the Distances that satisfy the condition that 'departure delay > 60 minutes or arrival delay > 60 minutes'
condition_np = (dep_delay > 60) | (arr_delay > 60)
filtered_distance = distance[condition_np]
5c. Use numpy mean() to calculate the average distances from question 5b. (Your solution should be the same as the average you obtained in question 4b.)
average_distance_numpy = np.mean(filtered_distance)
print("Average distance with numpy:", average_distance_numpy)
Average distance with numpy: 793.1002773092284
5d. How long does the program take to get the average?
import pandas as pd
import numpy as np
import time
s_t = time.time()
print(f"used time {time.time()-s_t}")
used time 0.014527320861816406