5. SQL in R

Concepts

Maggie still needs to add content.

Demo

Please use seminar-r for this part.

We will show how to make SQL calls from R, so that we can (for example) make plots related to our SQL queries. When we make a SQL call in R, the data is returned as an R database.

Connect to the IMDb SQLite database

conn <- dbConnect(RSQLite::SQLite(), "/anvil/projects/tdm/data/movies_and_tv/imdb.db")

Also load libraries we will be using in R:

library(RSQLite)
library(dplyr)
library(ggplot2)

Check available tables:

dbListTables(conn)

Write the first SQL query in R:

dbGetQuery(conn, "SELECT * FROM titles LIMIT 5;")

Start asking some questions: what type of titles Quentin Tarantino mainly works on (following query takes some time)

tarantino_types <- dbGetQuery(conn, "
SELECT t.type, COUNT(*) AS count
FROM titles AS t
JOIN crew AS c ON t.title_id = c.title_id
JOIN people AS p ON c.person_id = p.person_id
WHERE p.name = 'Quentin Tarantino'
GROUP BY t.type
ORDER BY count DESC;
")

tarantino_types

Plot title types

ggplot(tarantino_types, aes(x = reorder(type, count), y = count)) +
  geom_bar(stat = "identity", fill = "#0073C2FF") +
  coord_flip() +
  labs(title = "Types of Titles Quentin Tarantino Worked On",
       x = "Title Type", y = "Count") +
  theme_minimal()

SQL syntax is NOT case sensitive. However, remember that R is case sensitive, so please be careful with the capitalization of data frame and variable names.

Practice on your own

Maggie still needs to add content.