R base
functions
table
table
is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table
is a function that accepts table
output, returning proportions of the counts.
Examples
In the DeathRecords data file, create a table of the values in the Race column and how many times that each Race value occurs.
Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
table(deathDF$Race)
1 2 3 4 5 6 7 18 28 38 2241510 309504 18031 13297 8159 700 11074 6778 4711 623 48 58 68 78 4913 316 8737 2818
In the DeathRecords data file, create a table showing how many people are in each of the 6 categories above at the time of their death.
The labels for part a should be the default labels, i.e., like this: (-Inf,18] (18,25] (25,35] (35,55] (55,150] (150, Inf]
"youth": less than or equal to 18 years old "young adult": older than 18 but less than or equal to 25 years old "adult": older than 25 but less than or equal to 35 years old "middle age adult": older than 35 but less than or equal to 55 years old "senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like) "unknown": age of 999 (you could use, say, ages 150 to Inf for this category) |
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
ages <- death_records$Age
age_groups <- cut(ages, breaks = c(0, 18, 25, 35, 55, 150, Inf))
first_table <- table(age_groups)
print(first_table)
age_groups (0,18] (18,25] (25,35] (35,55] (55,150] (150,Inf] 36033 27691 49540 271181 2246155 571
In the DeathRecords data file, create a table showing how many people are in each of the 6 categories above at the time of their death but also adding labels corresponding to the 6 categories below.
The labels for part a should be the default labels, i.e., like this: (-Inf,18] (18,25] (25,35] (35,55] (55,150] (150, Inf]
"youth": less than or equal to 18 years old "young adult": older than 18 but less than or equal to 25 years old "adult": older than 25 but less than or equal to 35 years old "middle age adult": older than 35 but less than or equal to 55 years old "senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like) "unknown": age of 999 (you could use, say, ages 150 to Inf for this category) |
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
ages <- death_records$Age
second_age_groups <- cut(ages,
breaks = c(0, 18, 25, 35, 55, 150, Inf),
labels = c("youth", "young adult", "adult", "middle age adult", "senior adult", "unknown"))
second_table <- table(second_age_groups)
print(second_table)
second_age_groups youth young adult adult middle age adult 36033 27691 49540 271181 senior adult unknown 2246155 571
cut
cut
breaks a vector into factors specified by the argument breaks
. cut
is particularly useful to break Date data into quarters (Q1, Q2), years (1999, 2000, 2001), and so on.
Examples
Use the cut command to classify people at their time of death into 6 categories:
"youth": less than or equal to 18 years old "young adult": older than 18 but less than or equal to 25 years old "adult": older than 25 but less than or equal to 35 years old "middle age adult": older than 35 but less than or equal to 55 years old "senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like) "unknown": age of 999 (you could use, say, ages 150 to Inf for this category) |
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
ages <- death_records$Age
# sort into categories but no labels
age_groups <- cut(ages, breaks = c(0, 18, 25, 35, 55, 150, Inf))
# add labels corresponding to the 6 categories above
second_age_groups <- cut(ages,
breaks = c(0, 18, 25, 35, 55, 150, Inf),
labels = c("youth", "young adult", "adult", "middle age adult", "senior adult", "unknown"))
subset
subset
is a function that helps you take subsets of data. By default, subset
removes NA rows.
subset does not perform any operation that can’t be accomplished by indexing.
|
Examples
In the DeathRecords data file, show the head of the subset of data for which Sex=='F'.
Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
femaleSubset <- subset(deathDF, Sex == 'F')
head(femaleSubset)
Id ResidentStatus Education1989Revision Education2003Revision EducationReportingFlag MonthOfDeath Sex AgeType Age AgeSubstitutionFlag ... CauseRecode39 NumberOfEntityAxisConditions NumberOfRecordAxisConditions Race BridgedRaceFlag RaceImputationFlag RaceRecode3 RaceRecode5 HispanicOrigin HispanicOriginRaceRecode <int> <int> <int> <int> <int> <int> <chr> <int> <int> <int> ... <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> 3 3 1 0 7 1 1 F 1 75 0 ... 28 2 2 1 0 0 1 1 100 6 6 6 1 0 5 1 1 F 1 93 0 ... 37 5 5 1 0 0 1 1 100 6 9 9 1 0 3 1 1 F 1 86 0 ... 37 1 1 1 0 0 1 1 100 6 11 11 1 0 3 1 1 F 1 79 0 ... 22 2 2 1 0 0 1 1 100 6 13 13 1 0 4 1 1 F 1 85 0 ... 22 5 5 1 0 0 1 1 100 6 14 14 1 0 3 1 1 F 1 84 0 ... 8 2 2 1 0 0 1 1 100 6
In the DeathRecords data file, show the head of the subset of data for which Sex=='F' & Age!=999.
Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
validFemaleSubset <- subset(deathDF, Sex == 'F' & Age != 999)
head(validFemaleSubset)
Id ResidentStatus Education1989Revision Education2003Revision EducationReportingFlag MonthOfDeath Sex AgeType Age AgeSubstitutionFlag ... CauseRecode39 NumberOfEntityAxisConditions NumberOfRecordAxisConditions Race BridgedRaceFlag RaceImputationFlag RaceRecode3 RaceRecode5 HispanicOrigin HispanicOriginRaceRecode <int> <int> <int> <int> <int> <int> <chr> <int> <int> <int> ... <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> 3 3 1 0 7 1 1 F 1 75 0 ... 28 2 2 1 0 0 1 1 100 6 6 6 1 0 5 1 1 F 1 93 0 ... 37 5 5 1 0 0 1 1 100 6 9 9 1 0 3 1 1 F 1 86 0 ... 37 1 1 1 0 0 1 1 100 6 11 11 1 0 3 1 1 F 1 79 0 ... 22 2 2 1 0 0 1 1 100 6 13 13 1 0 4 1 1 F 1 85 0 ... 22 5 5 1 0 0 1 1 100 6 14 14 1 0 3 1 1 F 1 84 0 ... 8 2 2 1 0 0 1 1 100 6