R base
is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table
is a function that accepts table
output, returning proportions of the counts.
In the Olympics data, which value appears in the "NOC" column the most times?
Click to see solution
myDF <- read.csv("/anvil/projects/tdm/data/olympics/athlete_events.csv")
head(sort(table(myDF$NOC), decreasing=TRUE), n=1)
USA 18853
& grepl
stands for " globally search for a regular expression and print all matches," just as in UNIX. The function allows you to use regular expressions to search for a pattern in a vector of strings or characters, and returns the index (indices) of the match(es).
Additionally, the function grepl
(derived from grep-logical) uses the same inputs, but returns a logical vector, where TRUE
indicates a match at that index, and FALSE
indicates the opposite.
How many rows have "Denmark" in the team name ("Denmark" may or may not be the exact team name)?
Click to see solution
table(grepl("Denmark", myDF$Team))["TRUE"]
TRUE: 3496
Find the names of the teams that have "Denmark" in the team name but are not exactly "Denmark".
Click to see solution
myDF$Team[grepl("Denmark", myDF$Team) & myDF$Team != "Denmark"]
'Denmark/Sweden' 'Denmark-2' 'Denmark-1' 'Denmark-1' 'Denmark-1' 'Denmark-2' 'Denmark-1' 'Denmark-2' 'Denmark-2' 'Denmark-2' 'Miss Denmark 1964' 'Denmark-1' 'Denmark-1' 'Denmark-2' 'Denmark-2' 'Denmark-3' 'Denmark-1' 'Denmark-2' 'Denmark-2' 'Denmark-1' 'Denmark-2' 'Denmark-2' 'Denmark-1' 'Denmark-2' 'Denmark-1' 'Denmark-1' 'Denmark-1' 'Denmark-2' 'Denmark-2' 'Denmark-2' 'Denmark-1' 'Denmark-2' 'Denmark-1' 'Denmark-2' 'Denmark-2' 'Denmark-2' 'Denmark-2' 'Denmark-4' 'Denmark-2' 'Denmark-1' 'Denmark/Sweden' 'Denmark-2' 'Denmark-1' 'Denmark-2' 'Denmark-1' 'Denmark-2' 'Denmark-1' 'Denmark-1' 'Denmark-1' 'Denmark-1' 'Miss Denmark 1964' 'Denmark-1' 'Denmark-1' 'Denmark-1' 'Denmark-3' 'Denmark-2' 'Denmark-2' 'Denmark-2' 'Denmark-1' 'Denmark/Sweden' 'Denmark/Sweden' 'Denmark-2' 'Denmark/Sweden' 'Denmark-1' 'Denmark-1' 'Denmark-2' 'Denmark-1' 'Denmark-4' 'Denmark-1' 'Denmark-2' 'Denmark-1' 'Denmark/Sweden'