TDM 20200: Mapping: Project 11 — Spring 2025

Motivation: We will now use our knowledge of plotting maps in Python.

Context: We will leverage what we know about several data sets, to make some maps.

Scope: Mapping

Learning Objectives:
  • We continue to learn how to make maps in Python

Make sure to read about, and use the template found here, and the important information about project submissions here.

Dataset(s)

In this project we will use the following mapping data:

  • /anvil/projects/tdm/data/taxi/yellow/

Questions

Load Pandas as Geopandas as follows:

import pandas as pd
import geopandas as gpd

Question 1 (2 pts)

Consider the flight data contained in the files from

/anvil/projects/tdm/data/taxi/yellow/yellow_tripdata_2009-01.csv

through

/anvil/projects/tdm/data/taxi/yellow/yellow_tripdata_2009-12.csv

Write a function called extractdate that accepts a 4-digit year (as a string), and a 2-digit month (as a string, with a leading 0 if needed), and a 2-digit day (as a string, with a leading 0 if needed). For the given year and month, the function should read in 3 columns from the correct year-and-month yellow taxi cab data file, namely, extracting only column 1 (which is the 2nd column, for "Trip_Pickup_DateTime"), and column 5 (which is the 6th column, for "Start_Lon"), and column 6 (which is the 7th column, for "Start_Lat").

For instance, extractdate("2009", "05", "29") should read in the 3 columns above from the file /anvil/projects/tdm/data/taxi/yellow/yellow_tripdata_2009-05.csv.

Deliverables
  • Create the function extractdate described above.

  • Be sure to document your work from Question 1, using some comments and insights about your work.

Question 2 (2 pts)

Now revise the function extractdate, so that the function creates a small data frame, which has only the taxi cab rides that started on the given year, month, day triple that was inputted to the function.

For instance, extractdate("2009", "05", "29") should return a data frame with 523947 rows and 3 columns,

and extractdate("2009", "02", "14") should return a data frame with 539684 rows and 3 columns.

Hint: You might need to check whether the year is correct, e.g.,

pd.to_datetime(myDF['Trip_Pickup_DateTime']).dt.year == int(myyear)

and the month is correct, e.g.,

pd.to_datetime(myDF['Trip_Pickup_DateTime']).dt.month == int(mymonth)

and the day is correct, e.g.,

pd.to_datetime(myDF['Trip_Pickup_DateTime']).dt.day == int(mydate)

Deliverables
  • Revise the function extractdate as described above.

  • Be sure to document your work from Question 2, using some comments and insights about your work.

Question 3 (2 pts)

Now add a line in your function to convert the "Start_Lon" and "Start_Lat" values into geopandas geometries, like this:

gdf = gpd.GeoDataFrame(goodDF, geometry=gpd.points_from_xy(goodDF.Start_Lon, goodDF.Start_Lat), crs="NAD83")

If you try to plot gdf like this: gdf.plot() it will look bad, because there is some erroneous latitude and longitude values, which we will remove in Question 4 (below).

Deliverables
  • Continue to revise the function extractdate as described above.

  • Be sure to document your work from Question 3, using some comments and insights about your work.

Question 4 (2 pts)

Now further refine your function, so that you are only including latitude and longitude values as described here:

In other words, we only want to preserve the values for which the longitude is between -74.27 and -73.68 (inclusive, i.e., including these extreme values), and for which the latitude is between 40.49 and 40.92 (inclusive, i.e., including these extreme values).

Deliverables
  • Further revise the function extractdate as described above.

  • Be sure to document your work from Question 4, using some comments and insights about your work.

Question 5 (2 pts)

Finally, instead of just using something like goodgdf.plot() to display the locations of the starting points of the cab rides, make sure that the points are drawn with small dots, for instance, using something like goodgdf.plot(markersize = 0.1).

Test your maps for May 29, 2009 using extractdate("2009", "05", "29") and also for February 14, 2009 using extractdate("2009", "02", "14"). The resulting maps will look very similar (showing the shape of New York City from these cab rides) but will have small differences.

Deliverables
  • Plot the starting points of the cab rides on May 29, 2009, and on February 14, 2009.

  • Be sure to document your work from Question 5, using some comments and insights about your work.

Submitting your Work

Please make sure that you added comments for each question, which explain your thinking about your method of solving each question. Please also make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template.

Congratulations! Assuming you’ve completed all the above questions, you are learning to apply your web scraping knowledge effectively!

Prior to submitting your work, you need to put your work into the project template, and re-run all of the code in your Jupyter notebook and make sure that the results of running that code is visible in your template. Please check the detailed instructions on how to ensure that your submission is formatted correctly. To download your completed project, you can right-click on the file in the file explorer and click 'download'.

Once you upload your submission to Gradescope, make sure that everything appears as you would expect to ensure that you don’t lose any points. We hope your first project with us went well, and we look forward to continuing to learn with you on future projects!!

Items to submit
  • firstname_lastname_project11.ipynb

It is necessary to document your work, with comments about each solution. All of your work needs to be your own work, with citations to any source that you used. Please make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template.

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output even though it may not.

Please take the time to double check your work. See here for instructions on how to double check this.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.