Apply Functions

tapply

The documentation definition for tapply is a bit more specific than the others, where the arguments are now (X, INDEX, FUN), with X being an object where the split function applies, INDEX is a factor by which X is grouped, and FUN is function as before.

To simplify this definition, we can say tapply applies FUN to X when X is grouped by INDEX.

Examples

Using the 5000_transactions csv file, find the sum of the amount spent (in the SPEND column) at each of the store regions (the STORE_R column)

Click to see solution
# read in data
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")

tapply(myDF$SPEND, myDF$STORE_R, sum, na.rm=TRUE)
CENTRAL
    8897305.13999992
EAST
    11699446.8599998
SOUTH
    7957920.76999994
WEST
    9680106.5399999

In the 5000_transactions csv file, find the total amount of money spent in 2016 altogether, and the total amount of money spent in 2017 altogether

Click to see solution
tapply(myDF$SPEND, myDF$YEAR, sum, na.rm=TRUE)
2016
    19051720.0099997
2017
    19183059.2999997

In the 5000_transactions csv file, show the top 10 types of PRODUCT_NUM, according to the total amount of money spent on those products (i.e., according to the sum of the SPEND column for those 10 PRODUCT_NUM values).

Click to see solution
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")

tail(sort(tapply(myDF$SPEND, myDF$PRODUCT_NUM, sum)), n=10)
89415
    50032.42
8523
    53845.65
1344763
    58170.84
4889358
    63823.61
85201
    65605.34
766108
    66085
74424
    75787.49
85311
    102928.59
1367192
    111433.78
8511
    131399.78

In the 5000_transactions csv file, show the sum of the values in the SPEND column according to the 8 possible pairs of YEAR and STORE_R values.

Click to see solution
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")

sum <- tapply(transactions$SPEND, list(transactions$YEAR, transactions$STORE_R) sum)
print(sum)
     CENTRAL EAST    SOUTH   WEST
2016 4471801 5829166 3996751 4754003
2017 4425505 5870281 3961170 4926104