Apply Functions
tapply
The documentation definition for tapply
is a bit more specific than the others, where the arguments are now (X, INDEX, FUN)
, with X
being an object where the split
function applies, INDEX
is a factor by which X
is grouped, and FUN
is function as before.
To simplify this definition, we can say tapply
applies FUN
to X
when X
is grouped by INDEX
.
Examples
Using the Iowa liquor sales file, use fread
to read all 27 million rows of the data set again, but this time, only read in the columns "Zip Code", "Category Name", "Sale (Dollars)." Find the 10 "Zip Code" values that have the largest sum of "Sale (Dollars)" altogether, and give those "Zip Code" values and each of their sums of "Sale (Dollars)".
Click to see solution
# read in data
iowa2 <- fread("/anvil/projects/tdm/data/iowa_liquor_sales/iowa_liquor_sales.csv", select=c("Zip Code", "Category Name", "Sale (Dollars)"))
zip_sales <- tapply(iowa2$`Sale (Dollars)`, iowa2$`Zip Code`, sum)
head(sort(zip_sales, decreasing=TRUE), 10)
50320 132861227.43 52402 108460935.17 52240 106827908.74 50266 95956448.74 51501 84485599.04 52241 80224356.18 50613 70716357.28 50311 65407916.64 52722 63447651.28 50021 61328202.38
Using the Iowa liquor sales file, find the 10 "Category Name" values that have the largest sum of "Sale (Dollars)" altogether, and give those "Category Name" values and each of their sums of "Sale (Dollars)".
Click to see solution
# read in data
iowa2 <- fread("/anvil/projects/tdm/data/iowa_liquor_sales/iowa_liquor_sales.csv", select=c("Zip Code", "Category Name", "Sale (Dollars)"))
category_sales <- tapply(iowa2$`Sale (Dollars)`, iowa2$`Category Name`, sum)
head(sort(category_sales, decreasing=TRUE), 10)
CANADIAN WHISKIES 457612891.06 AMERICAN VODKAS 380307151.309999 STRAIGHT BOURBON WHISKIES 257794861.83 SPICED RUM 254362805.42 WHISKEY LIQUEUR 199736754.69 IMPORTED VODKAS 183082358.92 TENNESSEE WHISKIES 162676709.12 100% AGAVE TEQUILA 124223944.31 BLENDED WHISKIES 109152590.55 IMPORTED BRANDIES 88413645.9