Think Summer: Project 1 Solutions — 2022

1. How many tables are in the SQL database?

One method is to load the SQL database

%sql sqlite:////anvil/projects/tdm/data/movies_and_tv/imdb.db

and select the names of the tables, as we explained in the introduction to the course. There are six tables: akas, crew, episodes, people, ratings, titles

    TYPE IN('table', 'view')
    AND name NOT LIKE 'sqlite_%'

or we can do this in R:

conn <- dbConnect(RSQLite::SQLite(), "/anvil/projects/tdm/data/movies_and_tv/imdb.db")

2. Find and print the title_id, rating, and number of votes (votes) for all movies that received at least 2 million votes.

In a second query (and new cell), use the information you found in the previous query to identify the primary_title of these movies.

These are the movies with at least 2 million votes:

SELECT * FROM ratings WHERE votes >= 2000000 LIMIT 5;

and then we can lookup their titles:

SELECT * FROM titles WHERE title_id = 'tt0111161' OR title_id = 'tt0468569' OR title_id = 'tt1375666' LIMIT 5;

3. Find the primary_title of every movie that is over 2 hours long or that premiered after 1990.

Order the result from newest premier year to oldest, and limit the output to 15 movies. Make sure premiered and runtime_minutes are not NULL.

We just add the conditions to the query about the titles table.

SELECT * FROM titles WHERE (type == 'movie') AND (runtime_minutes IS NOT NULL) AND (premiered IS NOT NULL) AND ((runtime_minutes > 120) OR (premiered > 1990)) ORDER BY premiered DESC LIMIT 15;

Now we can find the total number of such movies, using the COUNT:

SELECT COUNT(*) FROM titles WHERE (type == 'movie') AND (runtime_minutes IS NOT NULL) AND (premiered IS NOT NULL) AND ((runtime_minutes > 120) OR (premiered > 1990)) ORDER BY premiered DESC LIMIT 15;

4. What actor has the longest name? Answer this question using just SQL.

We can use the length function, as follows:

SELECT *, length(name) FROM people ORDER BY length(name) DESC LIMIT 5;

5. LIKE is a very powerful tool. You can read about SQLite’s version of LIKE here. Use LIKE and/or R to get a count of how many movies (type='movie') that starts with each letter of the alphabet. Can you think of another way to do this? If so, show us, and explain what you did!

I used the substr function, which gets a substring from the string. In this case, I used it to just get the first letter of the primary_title.

SELECT substr(primary_title,1,1), count(substr(primary_title,1,1)) FROM titles GROUP BY substr(primary_title,1,1) ORDER BY count(substr(primary_title,1,1)) DESC LIMIT 5;