R `base` functions

table

table is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table is a function that accepts table output, returning proportions of the counts.

Examples

Which value appears in the "STATE" column the most times, in itcont1980.txt?

Click to see solution

library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")

head(sort(table(myDF$STATE), decreasing=TRUE), n=1)

   CA
3706

Which value appears in the "NAME" column the most times?

Click to see solution

head(sort(table(myDF$NAME), decreasing=TRUE), n=1)

AMERICAN MEDICAL POLITICAL ACTION COMMITTEE
                                        769

`paste` and `paste0`

paste is a function that converts vector elements to character strings and then concatenates them. It has a sep argument (default sep = " ") where the user can include a phrase/string to separate the strings being pasted together

paste0 is a version of paste where its sep argument is "", meaning the strings will be linked with no characters in between.

Examples

Use the paste command to join the "CITY" and "STATE" columns, with the goal of determining the top 5 city-and-state locations where donations were made.

Click to see solution

head(sort(table(paste(myDF$"CITY", myDF$"STATE", sep=", ")), decreasing=TRUE), n=6)

   NEW YORK, NY              ,      HOUSTON, TX      DALLAS, TX  WASHINGTON, DC
          13862           11582           10146            6438            5890
LOS ANGELES, CA
           5866

subset

subset is a function that helps you take subsets of data. By default, subset removes NA rows.

subset does not perform any operation that can’t be accomplished by indexing.

Examples

There are only 9 entries in which the TRANSACTION_DT value is missing, namely: one donation from CURCIO, BARBARA G and two donations from WOLFF, GARY W. and six donations from who?? (find their identity)! Find the name of the person who made six donations in 1980 with a missing TRANSACTION_DT.

Click to see solution

library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")

missingDF <- subset(myDF, is.na(TRANSACTION_DT))

donorCounts <- table(missingDF$NAME)

name <- names(donorCounts[donorCounts == 6])

print(donorCounts)
print(name)

      CURCIO, BARBARA G SCHECK, RICHARD HERBERT          WOLFF, GARY W.
                      1                       6                       2
[1] "SCHECK, RICHARD HERBERT"

Using itcont1980.txt, give the 9 NAME values for which the TRANSACTION_DT value is missing, using indexing instead of the subset function.

Click to see solution

library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")

myDF$NAME[is.na(myDF$TRANSACTION_DT)]

    'SCHECK, RICHARD HERBERT'
    'SCHECK, RICHARD HERBERT'
    'SCHECK, RICHARD HERBERT'
    'SCHECK, RICHARD HERBERT'
    'SCHECK, RICHARD HERBERT'
    'SCHECK, RICHARD HERBERT'
    'CURCIO, BARBARA G'
    'WOLFF, GARY W.'
    'WOLFF, GARY W.'

Using itcont1980.txt, using the subset function to get a data frame that contains only the donations for which the STATE is IN. From the CITY column of this subset, make a table of the number of occurrences of each CITY. Sort the table and print the largest 41 entries.

Click to see solution

library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")

IN_donations <- subset(myDF, STATE == "IN")

cities <- table(IN_donations$CITY)

sorted_cities <- sort(cities, decreasing = TRUE)

print(head(sorted_cities, n=41))

  INDIANAPOLIS         MUNCIE         CARMEL     FORT WAYNE     EVANSVILLE
          1443            232            167            157            151
       ELKHART       FT WAYNE         INDPLS     SOUTH BEND      LAFAYETTE
           132            132            123            123             85
   TERRE HAUTE        LAPORTE     ZIONSVILLE           GARY   MERRILLVILLE
            75             53             52             47             46
      COLUMBUS         KOKOMO       RICHMOND    BLOOMINGTON    CROWN POINT
            45             45             43             38             35
     GREENWOOD       ANDERSON     VALPARAISO         MUNICE        MUNSTER
            35             34             34             31             30
        WABASH       SYRACUSE WEST LAFAYETTE        HAMMOND    NOBLESVILLE
            27             24             24             22             22
   W LAFAYETTE         WARSAW      VINCENNES         HOBART       HIGHLAND
            22             21             20             17             16
 MICHIGAN CITY      MISHAWAKA     CHESTERTON    INDIANPOLIS     PLAINFIELD
            16             16             15             15             15
     WALKERTON
            15

Using itcont1980.txt, using indexing (not a subset), give a table of the top 41 cities in Indiana, according to the number of donations from people in that city.

Click to see solution

library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")

IN_donations_2 <- myDF[myDF$STATE == "IN"]

cities_2 <- table(IN_donations_2$CITY)

sorted_cities_2 <- sort(cities_2, decreasing = TRUE)

print(head(sort(sorted_cities_2, decreasing = TRUE), n=41))

  INDIANAPOLIS         MUNCIE         CARMEL     FORT WAYNE     EVANSVILLE
          1443            232            167            157            151
       ELKHART       FT WAYNE         INDPLS     SOUTH BEND      LAFAYETTE
           132            132            123            123             85
   TERRE HAUTE        LAPORTE     ZIONSVILLE           GARY   MERRILLVILLE
            75             53             52             47             46
      COLUMBUS         KOKOMO       RICHMOND    BLOOMINGTON    CROWN POINT
            45             45             43             38             35
     GREENWOOD       ANDERSON     VALPARAISO         MUNICE        MUNSTER
            35             34             34             31             30
        WABASH       SYRACUSE WEST LAFAYETTE        HAMMOND    NOBLESVILLE
            27             24             24             22             22
   W LAFAYETTE         WARSAW      VINCENNES         HOBART       HIGHLAND
            22             21             20             17             16
 MICHIGAN CITY      MISHAWAKA     CHESTERTON    INDIANPOLIS     PLAINFIELD
            16             16             15             15             15
     WALKERTON
            15

sum

sum is a function that calculates the sum of a vector of values.

Examples

Use `fread` to read only the 10th and 15th fields of itcont2020.txt. List the top 10 states according to the sum of the donation amounts, and the sum of the donation amounts in each of these top 10 states.

Click to see solution

donations <- fread("/anvil/projects/tdm/data/election/itcont2020.txt", select = c(10, 15), col.names = c("state", "amount"))

donations[, .(total = sum(amount)), by = state][order(-total)][1:10]

state	total
CA	2803518256
NY	2521628477
VA	1122819081
DC	1060820708
TX	1005943807
FL	913762179
IL	610913249
MA	521866708
NV	384867853
WA	384169499

R base functions

table

Examples

Which value appears in the "STATE" column the most times, in itcont1980.txt?

Which value appears in the "NAME" column the most times?

paste and paste0

Examples

Use the paste command to join the "CITY" and "STATE" columns, with the goal of determining the top 5 city-and-state locations where donations were made.

subset

Examples

There are only 9 entries in which the TRANSACTION_DT value is missing, namely: one donation from CURCIO, BARBARA G and two donations from WOLFF, GARY W. and six donations from who?? (find their identity)! Find the name of the person who made six donations in 1980 with a missing TRANSACTION_DT.

Using itcont1980.txt, give the 9 NAME values for which the TRANSACTION_DT value is missing, using indexing instead of the subset function.

Using itcont1980.txt, using the subset function to get a data frame that contains only the donations for which the STATE is IN. From the CITY column of this subset, make a table of the number of occurrences of each CITY. Sort the table and print the largest 41 entries.

Using itcont1980.txt, using indexing (not a subset), give a table of the top 41 cities in Indiana, according to the number of donations from people in that city.

sum

Examples

Use fread to read only the 10th and 15th fields of itcont2020.txt. List the top 10 states according to the sum of the donation amounts, and the sum of the donation amounts in each of these top 10 states.

R `base` functions

`paste` and `paste0`

Use `fread` to read only the 10th and 15th fields of itcont2020.txt. List the top 10 states according to the sum of the donation amounts, and the sum of the donation amounts in each of these top 10 states.