R base
functions
subset
subset
is a function that helps you take subsets of data. By default, subset
removes NA rows.
subset does not perform any operation that can’t be accomplished by indexing.
|
Examples
Using the reviews_sample csv file, make a subset of the beers that have (score != 5) & (overall == 5) (in other words the score value is not equal to 5 but the overall value is equal to 5).
Click to see solution
beerDF <- read.csv("/anvil/projects/tdm/data/beer/reviews_sample.csv")
beerSubset <- subset(beerDF, score != 5 & overall == 5)
table
table
is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table
is a function that accepts table
output, returning proportions of the counts.
Examples
Using the reviews_sample csv file, make a table of the values in the column myDF$username but do not print all of the values. Please sort the values and show only the tail, so that you can see the most popular 6 username values, and the number of reviews that each of these 6 people wrote.
Click to see solution
beerDF <- read.csv("/anvil/projects/tdm/data/beer/reviews_sample.csv")
tail(sort(table(beerDF$username)), 6)
jaydoc Texasfan549 Sammy kylehay2004 StonedTrippin 1489 1532 1591 1628 1630 acurtis 1646
mean
mean
is a function that calculates the average of a vector of values.
You will often find yourself using the na.rm
argument, short for NA value removal. Most real-life data will contain missing values somewhere, and na.rm = TRUE
will automatically remove those values from consideration during a function call or computation. na.rm = FALSE
is the default, so make sure to include na.rm = TRUE
if you’re unsure of your data’s composition.