Think Summer: Project 1 Solutions — 2022
1. How many tables are in the SQL database?
One method is to load the SQL database
%sql sqlite:////anvil/projects/tdm/data/movies_and_tv/imdb.db
and select the names of the tables, as we explained in the introduction to the course. There are six tables: akas
, crew
, episodes
, people
, ratings
, titles
%%sql
SELECT
name
FROM
sqlite_master
WHERE
TYPE IN('table', 'view')
AND name NOT LIKE 'sqlite_%'
ORDER BY
1;
or we can do this in R:
%%R
library(RSQLite)
conn <- dbConnect(RSQLite::SQLite(), "/anvil/projects/tdm/data/movies_and_tv/imdb.db")
dbListTables(conn)
2. Find and print the title_id, rating, and number of votes (votes) for all movies that received at least 2 million votes.
In a second query (and new cell), use the information you found in the previous query to identify the primary_title of these movies.
These are the movies with at least 2 million votes:
%%sql
SELECT * FROM ratings WHERE votes >= 2000000 LIMIT 5;
and then we can lookup their titles:
%%sql
SELECT * FROM titles WHERE title_id = 'tt0111161' OR title_id = 'tt0468569' OR title_id = 'tt1375666' LIMIT 5;
3. Find the primary_title of every movie that is over 2 hours long or that premiered after 1990.
Order the result from newest premier year to oldest, and limit the output to 15 movies. Make sure premiered and runtime_minutes are not NULL.
We just add the conditions to the query about the titles table.
%%sql
SELECT * FROM titles WHERE (type == 'movie') AND (runtime_minutes IS NOT NULL) AND (premiered IS NOT NULL) AND ((runtime_minutes > 120) OR (premiered > 1990)) ORDER BY premiered DESC LIMIT 15;
Now we can find the total number of such movies, using the COUNT
:
%%sql
SELECT COUNT(*) FROM titles WHERE (type == 'movie') AND (runtime_minutes IS NOT NULL) AND (premiered IS NOT NULL) AND ((runtime_minutes > 120) OR (premiered > 1990)) ORDER BY premiered DESC LIMIT 15;
4. What actor has the longest name? Answer this question using just SQL.
We can use the length
function, as follows:
%%sql
SELECT *, length(name) FROM people ORDER BY length(name) DESC LIMIT 5;
5. LIKE is a very powerful tool. You can read about SQLite’s version of LIKE here. Use LIKE and/or R to get a count of how many movies (type='movie') that starts with each letter of the alphabet. Can you think of another way to do this? If so, show us, and explain what you did!
I used the substr
function, which gets a substring from the string. In this case, I used it to just get the first letter of the primary_title
.
%%sql
SELECT substr(primary_title,1,1), count(substr(primary_title,1,1)) FROM titles GROUP BY substr(primary_title,1,1) ORDER BY count(substr(primary_title,1,1)) DESC LIMIT 5;