TDM 20200: Mapping: Project 9 — Spring 2025

Motivation: We continue to learn how to plot maps in Python.

Context: We will focus on a broader selection of maps.

Scope: Mapping

Learning Objectives:

We continue to learn how to make maps in Python

Make sure to read about, and use the template found here, and the important information about project submissions here.

Dataset(s)

In this project we will use the following mapping data:

/anvil/projects/tdm/data/nps/nps_boundary.shp
/anvil/projects/tdm/data/boundaries/cb_2018_us_state_20m.shp
/anvil/projects/tdm/data/aiannh/tl_2023_us_aiannh.shp

Questions

Load Pandas as Geopandas as follows:

import pandas as pd
import geopandas as gpd

Question 1 (2 pts)

The maps for the National Park Service are available in this shapefile:

/anvil/projects/tdm/data/nps/nps_boundary.shp

If you try to plot all 437 shape files from this data set, you will find that the map is hard to see, because the parks are spread out.

Let’s agree to only plot the parks from the 48 continental United States, plus the District of Columbia.

To do this, we need to remove the parks from Alaska, Hawaii, American Samoa, Guam, Puerto Rico, and the Virgin Islands.

There is also one park we should remove, called American Memorial Park, which is located in American Samoa but the STATE value is listed as NaN. It is OK to drop this park too (the NPS is not finished setting its boundaries yet).

There should be 396 parks remaining. Plot the boundaries of these 396 parks.

Deliverables

Remove the parks discussed above, then plot the boundaries of the remaining 396 parks.
Be sure to document your work from Question 1, using some comments and insights about your work.

Question 2 (2 pts)

California has the most parks. For this question, make a plot that contains only the parks from California.

(If you look at the maps in Questions 1 and 2 separately, does the map from Question 1 make more sense now? Can you see where California is located in the map from Question 1, on the west side of the map?)

Deliverables

Make a plot that contains only the parks from California
Be sure to document your work from Question 2, using some comments and insights about your work.

Question 3 (2 pts)

Now we merge the data from two data sets:

The first data set is the data from Question 1, still excluding the data from Alaska, Hawaii, American Samoa, Guam, Puerto Rico, Virgin Islands, and also excluding the data from the American Memorial Park.

The second data set is from Project 7, using the file: /anvil/projects/tdm/data/boundaries/cb_2018_us_state_20m.shp and excluding the data from Alaska, Hawaii, and Puerto Rico (this data set does not have any data from American Samoa, Guam, Virgin Islands).

Rename the column STUSPS from the second data set, and (instead) call this column STATE.

Now combine the data from these two data sets, keeping only the columns STATE and geometry. You can use pd.concat with the option ignore_index=True. Your resulting data frame should have 445 rows and 2 columns (396 rows from the first data set, and 49 rows from the second data set).

If your combined data set is called (for instance) by the name myresults then you can plot only the boundaries using:

myresults.boundary.plot()

In this way, you will be able to see the boundaries of the 48 continental states (plus the District of Columbia), and also will be able to see the boundaries of the national parks.

If the boundary lines look too thick, you can make the boundary lines thinner, for instance, like this:

myresults.boundary.plot(linewidth=0.2)

Deliverables

Display the boundaries of the 48 continental states (plus the District of Columbia), and also (on the same plot) the boundaries of the national parks.
Be sure to document your work from Question 3, using some comments and insights about your work.

Question 4 (2 pts)

Now revisit Question 3, but this time only show the boundaries of the parks from California, and also (on the same plot) the boundary of the state of California. You will see that California has one park with boundaries that poke into Nevada, and also some parks on islands.

Deliverables

Display the boundaries of the parks from California, and also the boundary of the state of California.
Be sure to document your work from Question 4, using some comments and insights about your work.

Question 5 (2 pts)

Finally, we look at the boundary files for American Indian/Alaska Native/Native Hawaiian Areas (AIANNH), described here:

catalog.data.gov/dataset/tiger-line-shapefile-current-nation-u-s-american-indian-alaska-native-native-hawaiian-areas-aia

The data does not provide a column by which we can separate Alaska and Hawaii from the continental 48 states (plus DC). BUT there is a natural dividing line at 125 degrees West longitude.

We can get the maximum and minimum values from a polygon as described here:

gis.stackexchange.com/questions/257807/get-min-max-lat-and-long-values-from-geodataframe

To try it, first read in the data like this:

aiannhdata = gpd.read_file('/anvil/projects/tdm/data/aiannh/tl_2023_us_aiannh.shp')

(you might want to plot it, to look at it yourself)

In the geomtetry column, we can use bounds:

aiannhdata['geometry'].bounds

and, in particular, we can get the minimum longitudinal value as follows:

aiannhdata['geometry'].bounds.minx

Now we can extract only the data for the continental 48 states (plus DC) as follows:

aiannhcontinentaldata = aiannhdata[aiannhdata['geometry'].bounds.minx > -125]

Let’s color the interior of these regions blue:

aiannhcontinentaldata = aiannhcontinentaldata.assign(mycolors='blue')

Now use the states data from Project 7 (only for the 48 continental states plus DC) and set mycolors for all of the state data from Project 7 to be green.

Finally, build a new data frame with the data from Project 7 (with green colors) and the aiannhcontinentaldata data (with blue colors) as follows:

myresults = pd.concat([myproject7statesdata[['geometry','mycolors']], aiannhcontinentaldata[['geometry','mycolors']]], ignore_index=True)

and finally plot a map that has the states colored in green, with the AIANNH regions colored in blue, as follows:

myresults.plot(color = myresults['mycolors'])

Deliverables

Plot a map that has the states colored in green, and has the AIANNH regions colored in blue.
Be sure to document your work from Question 5, using some comments and insights about your work.

Submitting your Work

Please make sure that you added comments for each question, which explain your thinking about your method of solving each question. Please also make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template.

Congratulations! Assuming you’ve completed all the above questions, you are learning to apply your web scraping knowledge effectively!

Prior to submitting your work, you need to put your work into the project template, and re-run all of the code in your Jupyter notebook and make sure that the results of running that code is visible in your template. Please check the detailed instructions on how to ensure that your submission is formatted correctly. To download your completed project, you can right-click on the file in the file explorer and click 'download'.

Once you upload your submission to Gradescope, make sure that everything appears as you would expect to ensure that you don’t lose any points. We hope your first project with us went well, and we look forward to continuing to learn with you on future projects!!

Items to submit

firstname_lastname_project9.ipynb

It is necessary to document your work, with comments about each solution. All of your work needs to be your own work, with citations to any source that you used. Please make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template.

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output even though it may not.

Please take the time to double check your work. See here for instructions on how to double check this.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.