Filtering and Selecting with the Flight Dataset

This example is from TDM 102 Project 10 Spring 2024.

These example(s) depend on the database:

  • /anvil/projects/tdm/data/flights/2014.csv

Learn more about the dataset here.

5a. Create 3 numpy arrays for the DepDelay, ArrDelay, and Distance data

import pandas as pd
import numpy as np

dep_delay = df['DepDelay'].to_numpy()
arr_delay = df['ArrDelay'].to_numpy()
distance = df['Distance'].to_numpy()

5b. Filter the numpy array with the Distance stored in it, so that you have only the Distances that satisfy the condition that 'departure delay > 60 minutes or arrival delay > 60 minutes'

condition_np = (dep_delay > 60) | (arr_delay > 60)
filtered_distance = distance[condition_np]

5c. Use numpy mean() to calculate the average distances from question 5b. (Your solution should be the same as the average you obtained in question 4b.)

average_distance_numpy = np.mean(filtered_distance)
print("Average distance with numpy:", average_distance_numpy)
Average distance with numpy: 793.1002773092284

5d. How long does the program take to get the average?

import pandas as pd
import numpy as np
import time

s_t = time.time()

print(f"used time {time.time()-s_t}")
used time 0.014527320861816406