Think Summer: Project 3 — 2022

Submission

Students need to submit the following file by 10:00PM EST through Gradescope inside Brightspace.

  1. A Jupyter notebook (a .ipynb file).

We’ve provided you with a template notebook for you to use. Please carefully read this section to get started.

When you are finished with the project, please make sure to run every cell in the notebook prior to submitting. To do this click Run  Run All Cells. Next, to export your notebook (your .ipynb file), click on File  Download, and download your .ipynb file.

Questions

Question 1

A primary key is a field in a table which uniquely identifies a row in the table. Primary keys must be unique values. This is enforced at the database level.

A foreign key is a field whose value matches a primary key in a different table.

Examine all six of the tables in the imdb database: akas, crew, episodes, people, ratings, titles. Identify each of the primary keys in each table, and identify each of the foreign keys in each table.

Relevant topics: primary and foreign keys

Items to submit
  • List any primary keys in the tables. (1 pt)

  • List any foreign keys in the tables. (1 pt)

  • Any code you used to answer this question.

Question 2

If you paste a title_id to the end of the following url, it will pull up the page for the title. For example, www.imdb.com/title/tt0413573 leads to the page for the TV series Grey’s Anatomy. Write a SQL query to confirm that the title_id "tt0413573" does indeed belong to Grey’s Anatomy. Then browse imdb.com and find your favorite TV show. Get the title_id from the url of your favorite TV show, and run the following query to confirm that the TV show is in our database.

SELECT * FROM titles WHERE title_id='<title id here>';
Make sure to replace "<title id here>" with the title_id of your favorite show. If your show does not appear, or has only a single season, pick another show until you find one we have in our database (that has multiple seasons).

Relevant topics: SQL, queries

Items to submit
  • SQL query used to confirm that title_id "tt0413573" does indeed belong to Grey’s Anatomy. (.5 pts)

  • The output of the query. (.5 pt)

  • The title_id of your favorite TV show. (.5 pts)

  • SQL query used to confirm the title_id for your favorite TV show. (.5 pts)

Question 3

The episode_title_id column in the episodes table references titles of individual episodes of a TV series. The show_title_id references the titles of the show itself. With that in mind, write a query that gets a list of all of the episodes_title_id’s (found in the `episodes table), with the associated primary_title (found in the titles table) for each episode of Grey’s Anatomy.

This video demonstrates how to extract titles of episodes in the imdb database.

Relevant topics: SQL, queries, joins

Items to submit
  • SQL query used to answer the question. (3 pts)

  • Output from running the SQL query. (2 pts)

Question 4

Joins are a critical concept to understand. They appear everywhere where relational data is found. In R, the merge function performs the same operations as joins. In python’s pandas package the merge method for the DataFrame object performs the same operations. Take some time to read this section.

In question 2 from the previous project, we asked you to use the ratings table to discover how many films have a rating of at least 8 and at least 50000 votes. You may have noticed, while you can easily do that, the end result is not human understandable. We see that there are films with those features but we don’t know what film title_id "tt0010323" is for. This is a great example where a simple join can answer this question for us.

Write a query that prints the primary_title, rating, and votes for all films with a rating of at least 8 and at least 50000 votes. Like in the previous version of this question, limit your output to 15 results.

Relevant topics: SQL, queries, joins

Items to submit
  • SQL query used to answer the question. (3 pts)

  • Output from running the SQL query.

The following are challenge questions and are worth 0 points. If you get done early give them a try!

Question 5

We want to write a query that returns the title and rating of the highest rated episode of your favorite TV show, which you chose in question 2. In order to do so, we will break the task into two parts in (5) and (6). First, write a query that returns a list of just episode_title_ids (found in the episodes table), with the associated primary_title (found in the titles table) for each episode.

Relevant topics: SQL, queries, joins

Items to submit
  • SQL query used to answer the question.

  • Output from running the SQL query.

Question 6

Write a query that adds the rating to the end of each episode. To do so, use the query you wrote in (5) as a subquery. Which episode has the highest rating? Is it also your favorite episode?

Relevant topics: SQL, queries, joins

Items to submit
  • SQL query used to answer the question.

  • Output from running the SQL query.