Think Summer: Day 4 Notes — 2024

We always need to re-load the libraries, if our kernel dies or if we start an all-new session.

Loading the R data.table library

%%R
library(data.table)

Loading the R library for SQL, and loading the database

We need to load this library, to make a connection to the database at the start. If something goes wrong with our database queries, we can always come back and run these two lines again. Ideally, we should only need to run these once per session, but sometimes we make mistakes, and our kernel dies, and we need to run these lines again.

%%R
library(RSQLite)
conn <- dbConnect(RSQLite::SQLite(), "/anvil/projects/tdm/data/movies_and_tv/imdb.db")

Importing data from SQL to R

For example, we can import the number of titles per year from SQL into R. (We are doing the work in SQL of finding out how many titles occurred in each year.)

%%R
myDF <- dbGetQuery(conn, "SELECT COUNT(*), premiered FROM titles GROUP BY premiered;")

Let’s first look at the head of the result:

%%R
head(myDF)

We can assign names to the columns of the data frame:

%%R
names(myDF) <- c("mycounts", "myyears")

and now the head of the data frame looks like this:

%%R
head(myDF)

Finally, we are prepared to plot the number of titles per year. We plot the years on the x-axis and the counts on the y-axis:

%%R
plot(myDF$myyears, myDF$mycounts)

Another way to do this is to import all of the years that titles premiered, and then make a table in R and plot the table. (This time, we are doing the work in R of finding out how many titles occurred in each year.)

%%R
myDF <- dbGetQuery(conn, "SELECT premiered FROM titles;")
%%R
head(myDF)
%%R
tail(myDF)

Now we make a table of the results:

%%R
table(myDF$premiered)

and we can plot the results:

%%R
plot(table(myDF$premiered))