R base functions

table

table is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table is a function that accepts table output, returning proportions of the counts.

Examples

Which value appears in the "STATE" column the most times, in itcont1980.txt?

Click to see solution
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")

head(sort(table(myDF$STATE), decreasing=TRUE), n=1)
   CA
3706

Which value appears in the "NAME" column the most times?

Click to see solution
head(sort(table(myDF$NAME), decreasing=TRUE), n=1)
AMERICAN MEDICAL POLITICAL ACTION COMMITTEE
                                        769

paste and paste0

paste is a function that converts vector elements to character strings and then concatenates them. It has a sep argument (default sep = " ") where the user can include a phrase/string to separate the strings being pasted together

paste0 is a version of paste where its sep argument is "", meaning the strings will be linked with no characters in between.

Examples

Use the paste command to join the "CITY" and "STATE" columns, with the goal of determining the top 5 city-and-state locations where donations were made.

Click to see solution
head(sort(table(paste(myDF$"CITY", myDF$"STATE", sep=", ")), decreasing=TRUE), n=6)
   NEW YORK, NY              ,      HOUSTON, TX      DALLAS, TX  WASHINGTON, DC
          13862           11582           10146            6438            5890
LOS ANGELES, CA
           5866