Think Summer: Project 5 — 2022

Submission

This project is not graded, you do not need to submit anything today.

Questions

Question 1

What is the primary_title of a TV Series, movie, short, etc. (any type in the titles table), that has been most widely distributed. Of course, we don’t really have the information we need to answer this question, however, let’s consider the most widely distributed piece of film to be the title_id that appears most in the akas table.

Relevant topics: SQL, queries, joins, aggregate functions

Items to submit
  • SQL used to solve the problem.

  • Output from running the code.

Question 2

What is the average rating of movies (specifically, when titles table type is movie), with at least 10,000 votes (from the ratings table) by year in which they premiered? Use SQL in combination with R to answer this question and create a graphic that illustrates the ratings. Do you notice any trends? If so, what?

Relevant topics: SQL, queries, joins, aggregate functions

Items to submit
  • R code used to solve the problem.

  • Output from running the code (including the graphic).

  • 1-2 sentences explain what, if any, trends you see.

Question 3

Get the name and number of appearances (count of person_id from the crew table) of the top 15 people from the people table with the most number of appearances.

Relevant topics: SQL, queries, joins, aggregate functions

Items to submit
  • SQL used to solve the problem.

  • Output from running the code.

Question 4

Wow! Those are some pretty large numbers! What if we asked the same question, but limited appearances to only items with at least 10000 votes? Write an SQL query, and compare the results. How did results shift? Are there any apparent themes in the results?

Relevant topics: SQL, queries, joins, aggregate functions

Items to submit
  • SQL used to solve the problem.

  • Output from running the code.

Question 5

You’ve had a long time working with this database! Now it is time for you to come up with a question to ask yourself (about the database), answer the question using a combination of SQL and/or R. Using your results, create the most interested and tricked out graphic you can come up with! The base R plotting functions and a library called ggplot are the best tools for the job (at least in R).

Too easy? Create multiple graphics on the same page (maybe for a different TV series or genre), and theme everything to look like a nice, finished product.

Still too easy? Create a function in R that, given a title_id, queries the database and generates a customized graphic based on the movie or tv series provided. You could go as far as scraping an image from imdb.com and using it as a backsplash for your image. Get creative and make your masterpiece.

Relevant topics: SQL, queries, joins, aggregate functions

Items to submit
  • SQL used to solve the problem.

  • Output from running the code.