Think Summer: Project 1 — 2022

Submission

Students need to submit the following file by 10:00PM EST through Gradescope inside Brightspace.

  1. A Jupyter notebook (a .ipynb file).

We’ve provided you with a template notebook for you to use. Please carefully read this section to get started.

When you are finished with the project, please make sure to run every cell in the notebook prior to submitting. To do this click Run  Run All Cells. Next, to export your notebook (your .ipynb file), click on File  Download, and download your .ipynb file.

Questions

Question 1

Read some information on SQL here. Explore the 6 tables in the database, and state 1 interesting fact that you find about at least one of the tables.

Normally, when using SQLite, the easiest way to display the tables in the database is by running .table or .tables. This is SQLite-specific behavior and therefore cannot be used in our Jupyter Lab environment. To show the tables using an R cell, run the following.

dbListTables(conn)

To list the tables using a SQL cell, run the following.

SELECT
    name
FROM
    sqlite_master
WHERE
    TYPE IN('table', 'view')
    AND name NOT LIKE 'sqlite_%'
ORDER BY
    1;

Relevant topics: SQL, queries

Items to submit
  • Code used to explore the database. (1 pt)

  • A sentence describing at least 1 interesting fact about at least one of the tables. (1 pt)

Question 2

Find and print the title_id, rating, and number of votes (votes) for all movies that received at least 2 million votes. In a second query (and new cell), use the information you found in the previous query to identify the primary_title of these movies.

Use the ratings table for the first query, and the titles table for the second.

Relevant topics: SQL, queries

Items to submit
  • SQL queries used to solve this problem. (2 pts)

  • Output from running the SQL query. (1 pt)

Question 3

Find the primary_title of every movie that is over 2 hours long or that premiered after 1990. Order the result from newest premier year to oldest, and limit the output to 15 movies. Make sure premiered and runtime_minutes are not NULL. After displaying these 15 movies, run the query again in a second cell, but this time only display the number of such movies.

Relevant topics: SQL, queries, order of operations

Make sure the type column is "movie".

Be careful! Order of operations is important here.
Items to submit
  • SQL query used to solve this problem. (3 pts)

  • Explain in 1-2 sentences why order of operations is important here. (1 pt)

  • Output from running the SQL query. (1 pt)

The following are challenge questions and are worth 0 points. If you get done early give them a try!

Question 4

What actor has the longest name? Answer this question using just SQL.

Relevant topics: SQLite length

Items to submit
  • SQL query used to solve this problem.

  • Output from running this query.

Question 5

LIKE is a very powerful tool. You can read about SQLite’s version of LIKE here. Use LIKE and/or R to get a count of how many movies (type='movie') that starts with each letter of the alphabet. Can you think of another way to do this? If so, show us, and explain what you did!

Relevant topics: SQLite LIKE

Items to submit
  • Code used to solve this problem.

  • 1-2 sentences explaining what your code does and why you like your method of solving the problem.