TDM 20100: Project 11 — SQL
Motivation: Now we will apply our SQL skills by studying movies and TV shows.
Context: The Internet Movie DataBase www.imdb.com provides data tables here: datasets.imdbws.com which we have stored in a database for you here: /anvil/projects/tdm/data/movies_and_tv/imdb2024.db
Scope: There are 7 tables to get familiar with: akas
, basics
, crew
, episode
, name
, principals
, ratings
Dataset(s)
This project will use the following dataset:
-
/anvil/projects/tdm/data/movies_and_tv/imdb2024.db
(Internet Movie DataBase (IMDB))
Our page in The Examples Book about SQL (in general) is given here: the-examples-book.com/tools/SQL/
Questions
Using the seminar
kernel, if you run this line in a cell by itself:
%sql sqlite:////anvil/projects/tdm/data/movies_and_tv/imdb2024.db
then you will have the movies and TV database loaded.
If your kernel dies, then you need to re-run the line above. You also need to re-run this line at the start of any new Jupyter Lab session. |
The tables in this database are:
akas
basics
crew
episode
name
principals
ratings
Question 1 (2 pts)
-
From the
basics
table, display the entry for Friends. (The title constant for Friends istt0108778
. Notice that this corresponds to the IMDB webpage for Friends: www.imdb.com/title/tt0108778 from IMDB.) -
Find all of the entries of the
principals
table that correspond to people in Friends. -
Use the
episode
table to discover how many episodes occurred during each season of Friends. For each season, print the season number and the number of episodes in that season.
Notice that the |
-
From the
basics
table, display the entry for Friends. -
Find all of the entries of the
principals
table that correspond to people in Friends. -
Use the
episode
table to discover how many episodes occurred during each season of Friends. For each season, print the season number and the number of episodes in that season.
Question 2 (2 pts)
Join the ratings
and the basics
table, to find the 13 titles that each have more than 2 million ratings. For each such title, output these values: tconst
, averageRating
, numVotes
, primaryTitle
, startYear
, runtimeMinutes
, and genres
-
For each of the 13 titles that each have more than 2 million ratings, output these values:
tconst
,averageRating
,numVotes
,primaryTitle
,startYear
,runtimeMinutes
, andgenres
Question 3 (2 pts)
Using the startYear
values from the basics
table, find the total number of entries in each startYear
.
-
For each
startYear
value from thebasics
table, print thestartYear
and the total number of entries in corresponding to thatstartYear
.
Question 4 (2 pts)
-
From the
name
table, find the nconst value for Emma Watson. (Notice that there are several entries with this name, but probably only one of them is the one that you want to analyze.) -
How many entries in the
principals
table correspond to Emma Watson (using only the correct value ofnconst
that you found in part a)?
-
From the
name
table, find the nconst value for Emma Watson. (Although several values appear, just find the 1 value that is correct for her.) -
How many entries in the
principals
table correspond to Emma Watson?
Question 5 (2 pts)
Join the basics
and the ratings
table to find the 3 entries that have startYear = 2024
and numVotes > 100000
and averageRating > 8
. (Print all of the columns from both tables, for these 3 entries.)
-
Join the
basics
and theratings
table to find the 3 entries that havestartYear = 2024
andnumVotes > 100000
andaverageRating > 8
. (Print all of the columns from both tables, for these 3 entries.)
Submitting your Work
We see that the SQL skills that we learned for the Lahman baseball database are directly applicable to analyzing the movies and TV database too! It is a good feeling to be able to apply what we have learned in a new setting!
-
firstname-lastname-project11.ipynb
You must double check your You will not receive full credit if your |