TDM 20200: Mapping: Project 9 — Spring 2025
Dataset(s)
In this project we will use the following mapping data:
-
/anvil/projects/tdm/data/nps/nps_boundary.shp
-
/anvil/projects/tdm/data/boundaries/cb_2018_us_state_20m.shp
-
/anvil/projects/tdm/data/aiannh/tl_2023_us_aiannh.shp
Questions
Load Pandas as Geopandas as follows:
import pandas as pd
import geopandas as gpd
Question 1 (2 pts)
The maps for the National Park Service are available in this shapefile:
/anvil/projects/tdm/data/nps/nps_boundary.shp
If you try to plot all 437 shape files from this data set, you will find that the map is hard to see, because the parks are spread out.
Let’s agree to only plot the parks from the 48 continental United States, plus the District of Columbia.
To do this, we need to remove the parks from Alaska, Hawaii, American Samoa, Guam, Puerto Rico, and the Virgin Islands.
There is also one park we should remove, called American Memorial Park
, which is located in American Samoa but the STATE
value is listed as NaN
. It is OK to drop this park too (the NPS is not finished setting its boundaries yet).
There should be 396 parks remaining. Plot the boundaries of these 396 parks.
-
Remove the parks discussed above, then plot the boundaries of the remaining 396 parks.
-
Be sure to document your work from Question 1, using some comments and insights about your work.
Question 2 (2 pts)
California has the most parks. For this question, make a plot that contains only the parks from California.
(If you look at the maps in Questions 1 and 2 separately, does the map from Question 1 make more sense now? Can you see where California is located in the map from Question 1, on the west side of the map?)
-
Make a plot that contains only the parks from California
-
Be sure to document your work from Question 2, using some comments and insights about your work.
Question 3 (2 pts)
Now we merge the data from two data sets:
The first data set is the data from Question 1, still excluding the data from Alaska, Hawaii, American Samoa, Guam, Puerto Rico, Virgin Islands, and also excluding the data from the American Memorial Park
.
The second data set is from Project 7, using the file:
/anvil/projects/tdm/data/boundaries/cb_2018_us_state_20m.shp
and excluding the data from Alaska, Hawaii, and Puerto Rico (this data set does not have any data from American Samoa, Guam, Virgin Islands).
Rename the column STUSPS
from the second data set, and (instead) call this column STATE
.
Now combine the data from these two data sets, keeping only the columns STATE
and geometry
. You can use pd.concat
with the option ignore_index=True
. Your resulting data frame should have 445 rows and 2 columns (396 rows from the first data set, and 49 rows from the second data set).
If your combined data set is called (for instance) by the name myresults
then you can plot only the boundaries using:
myresults.boundary.plot()
In this way, you will be able to see the boundaries of the 48 continental states (plus the District of Columbia), and also will be able to see the boundaries of the national parks.
If the boundary lines look too thick, you can make the boundary lines thinner, for instance, like this:
myresults.boundary.plot(linewidth=0.2)
-
Display the boundaries of the 48 continental states (plus the District of Columbia), and also (on the same plot) the boundaries of the national parks.
-
Be sure to document your work from Question 3, using some comments and insights about your work.
Question 4 (2 pts)
Now revisit Question 3, but this time only show the boundaries of the parks from California, and also (on the same plot) the boundary of the state of California. You will see that California has one park with boundaries that poke into Nevada, and also some parks on islands.
-
Display the boundaries of the parks from California, and also the boundary of the state of California.
-
Be sure to document your work from Question 4, using some comments and insights about your work.
Question 5 (2 pts)
Finally, we look at the boundary files for American Indian/Alaska Native/Native Hawaiian Areas (AIANNH), described here:
The data does not provide a column by which we can separate Alaska and Hawaii from the continental 48 states (plus DC). BUT there is a natural dividing line at 125 degrees West longitude.
We can get the maximum and minimum values from a polygon as described here:
To try it, first read in the data like this:
aiannhdata = gpd.read_file('/anvil/projects/tdm/data/aiannh/tl_2023_us_aiannh.shp')
(you might want to plot it, to look at it yourself)
In the geomtetry
column, we can use bounds
:
aiannhdata['geometry'].bounds
and, in particular, we can get the minimum longitudinal value as follows:
aiannhdata['geometry'].bounds.minx
Now we can extract only the data for the continental 48 states (plus DC) as follows:
aiannhcontinentaldata = aiannhdata[aiannhdata['geometry'].bounds.minx > -125]
Let’s color the interior of these regions blue:
aiannhcontinentaldata = aiannhcontinentaldata.assign(mycolors='blue')
Now use the states data from Project 7 (only for the 48 continental states plus DC) and set mycolors
for all of the state data from Project 7 to be green
.
Finally, build a new data frame with the data from Project 7 (with green colors) and the aiannhcontinentaldata
data (with blue colors) as follows:
myresults = pd.concat([myproject7statesdata[['geometry','mycolors']], aiannhcontinentaldata[['geometry','mycolors']]], ignore_index=True)
and finally plot a map that has the states colored in green, with the AIANNH regions colored in blue, as follows:
myresults.plot(color = myresults['mycolors'])
-
Plot a map that has the states colored in green, and has the AIANNH regions colored in blue.
-
Be sure to document your work from Question 5, using some comments and insights about your work.
Submitting your Work
Please make sure that you added comments for each question, which explain your thinking about your method of solving each question. Please also make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template.
Congratulations! Assuming you’ve completed all the above questions, you are learning to apply your web scraping knowledge effectively!
Prior to submitting your work, you need to put your work into the project template, and re-run all of the code in your Jupyter notebook and make sure that the results of running that code is visible in your template. Please check the detailed instructions on how to ensure that your submission is formatted correctly. To download your completed project, you can right-click on the file in the file explorer and click 'download'.
Once you upload your submission to Gradescope, make sure that everything appears as you would expect to ensure that you don’t lose any points. We hope your first project with us went well, and we look forward to continuing to learn with you on future projects!!
-
firstname_lastname_project9.ipynb
It is necessary to document your work, with comments about each solution. All of your work needs to be your own work, with citations to any source that you used. Please make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template. You must double check your Please take the time to double check your work. See here for instructions on how to double check this. You will not receive full credit if your |