R base functions

subset

subset is a function that helps you take subsets of data. By default, subset removes NA rows.

subset does not perform any operation that can’t be accomplished by indexing.

Examples

Using the reviews_sample csv file, make a subset of the beers that have (score != 5) & (overall == 5) (in other words the score value is not equal to 5 but the overall value is equal to 5).

Click to see solution
beerDF <- read.csv("/anvil/projects/tdm/data/beer/reviews_sample.csv")

beerSubset <- subset(beerDF, score != 5 & overall == 5)

nrow

nrow is a function that returns the number of rows of a matrix, vector, array or data.frame.

Examples

Using the reviews_sample csv file, how many lines of data are in the subset that has (score != 5) & (overall == 5)?

Click to see solution
numRows <- nrow(beerSubset)

print(numRows)
[1] 12436

table

table is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table is a function that accepts table output, returning proportions of the counts.

Examples

Click to see solution
beerDF <- read.csv("/anvil/projects/tdm/data/beer/reviews_sample.csv")
tail(sort(table(beerDF$username)), 6)
       jaydoc   Texasfan549         Sammy   kylehay2004 StonedTrippin
         1489          1532          1591          1628          1630
      acurtis
         1646

mean

mean is a function that calculates the average of a vector of values.

You will often find yourself using the na.rm argument, short for NA value removal. Most real-life data will contain missing values somewhere, and na.rm = TRUE will automatically remove those values from consideration during a function call or computation. na.rm = FALSE is the default, so make sure to include na.rm = TRUE if you’re unsure of your data’s composition.

Examples

Using the reviews_sample csv file, what is the average score of the reviews that were written by the user acurtis?

Click to see solution
beerDF <- read.csv("/anvil/projects/tdm/data/beer/reviews_sample.csv")
mean(beerDF$score[beerDF$username == "acurtis"])
 3.65580194410693