R base functions

table

table is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table is a function that accepts table output, returning proportions of the counts.

Examples

In the DeathRecords data file, create a table of the values in the Race column and how many times that each Race value occurs.

Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")
table(deathDF$Race)
      1       2       3       4       5       6       7      18      28      38
2241510  309504   18031   13297    8159     700   11074    6778    4711     623
     48      58      68      78
   4913     316    8737    2818

In the DeathRecords data file, create a table showing how many people are in each of the 6 categories above at the time of their death.

The labels for part a should be the default labels, i.e., like this: (-Inf,18] (18,25] (25,35] (35,55] (55,150] (150, Inf]

"youth": less than or equal to 18 years old
"young adult": older than 18 but less than or equal to 25 years old
"adult": older than 25 but less than or equal to 35 years old
"middle age adult": older than 35 but less than or equal to 55 years old
"senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like)
"unknown": age of 999 (you could use, say, ages 150 to Inf for this category)
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")

ages <- death_records$Age

age_groups <- cut(ages, breaks = c(0, 18, 25, 35, 55, 150, Inf))
first_table <- table(age_groups)
print(first_table)
age_groups
   (0,18]   (18,25]   (25,35]   (35,55]  (55,150] (150,Inf]
    36033     27691     49540    271181   2246155       571

In the DeathRecords data file, create a table showing how many people are in each of the 6 categories above at the time of their death but also adding labels corresponding to the 6 categories below.

The labels for part a should be the default labels, i.e., like this: (-Inf,18] (18,25] (25,35] (35,55] (55,150] (150, Inf]

"youth": less than or equal to 18 years old
"young adult": older than 18 but less than or equal to 25 years old
"adult": older than 25 but less than or equal to 35 years old
"middle age adult": older than 35 but less than or equal to 55 years old
"senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like)
"unknown": age of 999 (you could use, say, ages 150 to Inf for this category)
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")

ages <- death_records$Age

second_age_groups <- cut(ages,
                         breaks = c(0, 18, 25, 35, 55, 150, Inf),
                         labels = c("youth", "young adult", "adult", "middle age adult", "senior adult", "unknown"))

second_table <- table(second_age_groups)
print(second_table)
second_age_groups
           youth      young adult            adult middle age adult
           36033            27691            49540           271181
    senior adult          unknown
         2246155              571

cut

cut breaks a vector into factors specified by the argument breaks. cut is particularly useful to break Date data into quarters (Q1, Q2), years (1999, 2000, 2001), and so on.

Examples

Use the cut command to classify people at their time of death into 6 categories:

"youth": less than or equal to 18 years old
"young adult": older than 18 but less than or equal to 25 years old
"adult": older than 25 but less than or equal to 35 years old
"middle age adult": older than 35 but less than or equal to 55 years old
"senior adult": greater than 55 years old but less than or equal to 150 years old (or any other upper threshold that you like)
"unknown": age of 999 (you could use, say, ages 150 to Inf for this category)
Click to see solution
death_records <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")

ages <- death_records$Age

# sort into categories but no labels
age_groups <- cut(ages, breaks = c(0, 18, 25, 35, 55, 150, Inf))

# add labels corresponding to the 6 categories above
second_age_groups <- cut(ages,
                         breaks = c(0, 18, 25, 35, 55, 150, Inf),
                         labels = c("youth", "young adult", "adult", "middle age adult", "senior adult", "unknown"))

subset

subset is a function that helps you take subsets of data. By default, subset removes NA rows.

subset does not perform any operation that can’t be accomplished by indexing.

Examples

In the DeathRecords data file, show the head of the subset of data for which Sex=='F'.

Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")

femaleSubset <- subset(deathDF, Sex == 'F')

head(femaleSubset)
Id	ResidentStatus	Education1989Revision	Education2003Revision	EducationReportingFlag	MonthOfDeath	Sex	AgeType Age	AgeSubstitutionFlag	...	CauseRecode39	NumberOfEntityAxisConditions	NumberOfRecordAxisConditions	Race	BridgedRaceFlag	RaceImputationFlag	RaceRecode3	RaceRecode5	HispanicOrigin	HispanicOriginRaceRecode
	<int>	<int>	<int>	<int>	<int>	<int>	<chr>	<int>	<int>	<int>	...	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>
3	3	1	0	7	1	1	F	1	75	0	...	28	2	2	1	0	0	1	1	100	6
6	6	1	0	5	1	1	F	1	93	0	...	37	5	5	1	0	0	1	1	100	6
9	9	1	0	3	1	1	F	1	86	0	...	37	1	1	1	0	0	1	1	100	6
11	11	1	0	3	1	1	F	1	79	0	...	22	2	2	1	0	0	1	1	100	6
13	13	1	0	4	1	1	F	1	85	0	...	22	5	5	1	0	0	1	1	100	6
14	14	1	0	3	1	1	F	1	84	0	...	8	2	2	1	0	0	1	1	100	6

In the DeathRecords data file, show the head of the subset of data for which Sex=='F' & Age!=999.

Click to see solution
deathDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv")

validFemaleSubset <- subset(deathDF, Sex == 'F' & Age != 999)

head(validFemaleSubset)
Id	ResidentStatus	Education1989Revision	Education2003Revision	EducationReportingFlag	MonthOfDeath	Sex	AgeType	Age	AgeSubstitutionFlag	...	CauseRecode39	NumberOfEntityAxisConditions	NumberOfRecordAxisConditions	Race	BridgedRaceFlag	RaceImputationFlag	RaceRecode3	RaceRecode5	HispanicOrigin	HispanicOriginRaceRecode
	<int>	<int>	<int>	<int>	<int>	<int>	<chr>	<int>	<int>	<int>	...	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>
3	3	1	0	7	1	1	F	1	75	0	...	28	2	2	1	0	0	1	1	100	6
6	6	1	0	5	1	1	F	1	93	0	...	37	5	5	1	0	0	1	1	100	6
9	9	1	0	3	1	1	F	1	86	0	...	37	1	1	1	0	0	1	1	100	6
11	11	1	0	3	1	1	F	1	79	0	...	22	2	2	1	0	0	1	1	100	6
13	13	1	0	4	1	1	F	1	85	0	...	22	5	5	1	0	0	1	1	100	6
14	14	1	0	3	1	1	F	1	84	0	...	8	2	2	1	0	0	1	1	100	6