R base functions

table

table is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table is a function that accepts table output, returning proportions of the counts.

Examples

Show how times each brand of ice cream appears, in each of the two files: products.csv and reviews.csv

Click to see solution
# read in the products file
products <- read.csv("/anvil/projects/tdm/data/icecream/combined/products.csv")
table(products$brand)

# read in the reviews file
reviews <- read.csv("/anvil/projects/tdm/data/icecream/combined/reviews.csv")
table(reviews$brand)
# products.csv
     bj breyers      hd talenti
     57      69      70      45

 # reviews.csv
     bj breyers      hd talenti
   7943    5007    4655    4069

merge

merge is a function that can be used to combine data.frames by row or column names. The effects of the function replicate the join operations in SQL.

Examples

Click to see solution
library(data.table)
productsDF <- fread("/anvil/projects/tdm/data/icecream/combined/products.csv")
reviewsDF <- fread("/anvil/projects/tdm/data/icecream/combined/reviews.csv")

newmergedDF <- merge(productsDF, reviewsDF, by = c("brand", "key") )
ChocolateChipCookieDoughDF <- subset(newmergedDF, (brand == "bj") & (key == "16_bj"))
mean(ChocolateChipCookieDoughDF$rating, na.rm = TRUE)
4.6

Using ice cream products and reviews files, merge the data according to the brand and key values. Then find the average number of stars for the reviews of ice cream with name == "Half Baked\302\256".

Click to see solution
library(data.table)
productsDF <- fread("/anvil/projects/tdm/data/icecream/combined/products.csv")
reviewsDF <- fread("/anvil/projects/tdm/data/icecream/combined/reviews.csv")

newmergedDF <- merge(productsDF, reviewsDF, by = c("brand", "key") )
HalfBakedDF <- subset(newmergedDF, name = "Half Baked")
mean(HalfBakedDF$stars, na.rm = TRUE)
4.22395496908739

grep & grepl

grep stands for " globally search for a regular expression and print all matches," just as in UNIX. The function allows you to use regular expressions to search for a pattern in a vector of strings or characters, and returns the index (indices) of the match(es).

Additionally, the function grepl (derived from grep-logical) uses the same inputs, but returns a logical vector, where TRUE indicates a match at that index, and FALSE indicates the opposite.

grep and grepl can be used on individual strings, though they match the entire string, not the index of the character that matches the regular expression. For grep, a hit would return [1] 1 and a miss would return integer(0). For grepl, you still get TRUE or FALSE.

Examples

Using ice cream products and reviews files, merge the data according to the brand and key values. Print the average number of stars for all 4831 of the reviews of these products that have "Chocolate" in the product name.

Click to see solution
library(data.table)
productsDF <- fread("/anvil/projects/tdm/data/icecream/combined/products.csv")
reviewsDF <- fread("/anvil/projects/tdm/data/icecream/combined/reviews.csv")

newmergedDF <- merge(productsDF, reviewsDF, by = c("brand", "key") )
ChocolateDF <- newmergedDF[grepl("Chocolate", newmergedDF$name)]
mean(ChocolateDF$rating, na.rm = TRUE)
 4.16127095839371