TDM 40100: Project 13 — 2022
Motivation: It has been a long semester! In this project, we want to give you some flexibility to explore and utilize some of the skills you’ve previously learned in the course. You will be given 4 options to choose from. Please note that we do not expect perfect submissions, but rather a strong effort in line with a typical project submission.
Context: This is the final project for TDM 40100, where you will choose from 4 options that each exercise some skills from the semester and more.
Scope: Python, sqlite3, playwright, selenium, pandas, matplotlib, and more.
Choose from one of the following options:
Use the provided functions and your sqlite skills to scrape and store 1000+ homes in an area of your choice. Use the data you stored in your database to perform an analysis of your choice. Examples of potentially interesting questions you could ask: - What percentage of homes have "fishy" histories? For example, a home for sale on the market for too long is viewed as "bad". You may notice homes being marked as "sold" and immediately put back on the market. This refreshes Zillow’s data and makes it look like the home is new to the market, when in fact it is not. - For your area, what is the average time on the market before the home is sold? What is the average price drop, and after how many days does the price drop occur?
Use the provided functions and libraries like
typer to build a CLI to make zillow queries and display data. Please incorporate at least 1 of the following "extra" features:
- Color your output using
- Or containerize your application using singularity.
- Or use
sqlite3 to cache the HTML blobs — if the blob for a home or query is not older than 1 day, then use the cached version instead of making a new request.
Abandon the housing altogether and instead have some FIFA fun. Scrape data from fbref.com/en/ and choose from two very similar projects.
selenium code to scrape data from fbref.com. Scrape 1000+ structured pieces of information and store it in a database to perform an analysis of your choice. Examples could be:
- Can you find any patterns that may indicate promising players under the age of 21 by looking at currently successful players data when they were young?
- What country produces the most talent (by some metric you describe)?
Build a CLI to make queries and display data. Please incorporate at least 1 of the following "extra" features:
Color your output using
Or containerize your application using singularity.
sqlite3to cache the HTML blobs — if the blob for a home or query is not older than 1 day, then use the cached version instead of making a new request.
Have another idea that utilizes the same skillsets? Please post it in Piazza to get approval from Kevin or Dr. Ward.
A markdown cell describing the option(s) you chose to complete for this project and why you chose it/them.
If you chose to scrape 1000+ bits of data, 2 SQL cells: 1 that demonstrates a sample of your data (for instance 5 rows printed out), and 1 that shows that you’ve scraped 1000+ records.
If you chose to scrape 1000+ bits of data, an analysis complete with your problem statement, how you chose to solve the problem, and your code and analysis, with at least 2 included visualizations, and a conclusion.
If you chose to build a CLI, a markdown cell describing the CLI and how to use it, and the options it has.
Screenshots demonstrating the capabilities of your CLI and the extra feature(s) you chose to implement.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.
In addition, please review our submission guidelines before submitting your project.