R base
functions
table
table
is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table
is a function that accepts table
output, returning proportions of the counts.
Examples
Show how times each brand of ice cream appears, in each of the two files: products.csv and reviews.csv
Click to see solution
# read in the products file
products <- read.csv("/anvil/projects/tdm/data/icecream/combined/products.csv")
table(products$brand)
# read in the reviews file
reviews <- read.csv("/anvil/projects/tdm/data/icecream/combined/reviews.csv")
table(reviews$brand)
# products.csv bj breyers hd talenti 57 69 70 45 # reviews.csv bj breyers hd talenti 7943 5007 4655 4069
merge
merge
is a function that can be used to combine data.frames by row or column names. The effects of the function replicate the join operations in SQL.
Examples
Using ice cream products and reviews files, merge the data according to the brand and key values. Then print the average number of stars for the "Chocolate Chip Cookie Dough" product, which has brand == "bj" and key == "16_bj"
Click to see solution
library(data.table)
productsDF <- fread("/anvil/projects/tdm/data/icecream/combined/products.csv")
reviewsDF <- fread("/anvil/projects/tdm/data/icecream/combined/reviews.csv")
newmergedDF <- merge(productsDF, reviewsDF, by = c("brand", "key") )
ChocolateChipCookieDoughDF <- subset(newmergedDF, (brand == "bj") & (key == "16_bj"))
mean(ChocolateChipCookieDoughDF$rating, na.rm = TRUE)
4.6
Using ice cream products and reviews files, merge the data according to the brand and key values. Then find the average number of stars for the reviews of ice cream with name == "Half Baked\302\256".
Click to see solution
library(data.table)
productsDF <- fread("/anvil/projects/tdm/data/icecream/combined/products.csv")
reviewsDF <- fread("/anvil/projects/tdm/data/icecream/combined/reviews.csv")
newmergedDF <- merge(productsDF, reviewsDF, by = c("brand", "key") )
HalfBakedDF <- subset(newmergedDF, name = "Half Baked")
mean(HalfBakedDF$stars, na.rm = TRUE)
4.22395496908739
grep
& grepl
grep
stands for " globally search for a regular expression and print all matches," just as in UNIX. The function allows you to use regular expressions to search for a pattern in a vector of strings or characters, and returns the index (indices) of the match(es).
Additionally, the function grepl
(derived from grep-logical) uses the same inputs, but returns a logical vector, where TRUE
indicates a match at that index, and FALSE
indicates the opposite.
|
Examples
Using ice cream products and reviews files, merge the data according to the brand and key values. Print the average number of stars for all 4831 of the reviews of these products that have "Chocolate" in the product name.
Click to see solution
library(data.table)
productsDF <- fread("/anvil/projects/tdm/data/icecream/combined/products.csv")
reviewsDF <- fread("/anvil/projects/tdm/data/icecream/combined/reviews.csv")
newmergedDF <- merge(productsDF, reviewsDF, by = c("brand", "key") )
ChocolateDF <- newmergedDF[grepl("Chocolate", newmergedDF$name)]
mean(ChocolateDF$rating, na.rm = TRUE)
4.16127095839371