Apply Functions
tapply
The documentation definition for tapply is a bit more specific than the others, where the arguments are now (X, INDEX, FUN), with X being an object where the split function applies, INDEX is a factor by which X is grouped, and FUN is function as before.
To simplify this definition, we can say tapply applies FUN to X when X is grouped by INDEX.
Examples
In athlete_events.csv, find the average height of the athletes in each country (the country is the NOC column).
Click to see solution
# read in data
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/olympics/athlete_events.csv")
tapply(myDF$Height, myDF$NOC, sum, na.rm=TRUE)
AFG AHO ALB ALG AND ANG ANT ANZ ARG ARM
9212 9042 9861 85255 23450 43660 20139 4595 403895 35935
ARU ASA AUS AUT AZE BAH BAN BAR BDI BEL
7102 3689 1191954 575742 47670 62814 7977 33907 6560 316177
BEN BER BHU BIH BIZ BLR BOH BOL BOT BRA
10421 24482 4829 20007 12779 297630 1044 19699 14768 597749
BRN BRU BUL BUR CAF CAM CAN CAY CGO CHA
18383 1495 523332 6512 8426 9503 1422227 13104 14316 6003
CHI CHN CIV CMR COD COK COL COM CPV CRC
94143 843337 28050 45145 11749 6339 166347 2563 3166 36604
CRO CRT CUB CYP CZE DEN DJI DMA DOM ECU
153317 0 386897 33745 333355 323637 5376 3148 36510 44326
EGY ERI ESA ESP EST ETH EUN FIJ FIN FRA
155129 6288 29334 766507 144338 59921 127136 36726 779838 1449532
FRG FSM GAB GAM GBR GBS GDR GEO GEQ GER
581244 4031 9947 6822 1395588 3390 459949 49878 3916 1315364
GHA GRE GRN GUA GUI GUM GUY HAI HKG HON
52686 339424 6306 56308 10719 14480 14656 12523 107634 24123
HUN INA IND IOA IRI IRL IRQ ISL ISR ISV
831216 56946 152930 12038 119247 185445 28344 79023 102388 42704
ITA IVB JAM JOR JPN KAZ KEN KGZ KIR KOR
1410280 6788 143225 11950 1271132 237756 120731 35615 1836 668190
KOS KSA KUW LAO LAT LBA LBR LCA LES LIB
1388 32378 38739 8488 140482 9771 9375 4722 8530 38660
LIE LTU LUX MAD MAL MAR MAS MAW MDA MDV
48378 109806 73722 18868 2007 102420 78937 13276 39076 6878
MEX MGL MHL MKD MLI MLT MNE MON MOZ MRI
418606 82368 2191 12767 11017 14613 17255 24324 9840 19352
MTN MYA NAM NBO NCA NED NEP NFL NGR NIG
4861 15614 12273 0 13743 679545 13932 170 130864 4310
NOR NRU NZL OMA PAK PAN PAR PER PHI PLE
593994 1839 368114 10858 63825 20454 17835 70980 89561 3054
PLW PNG POL POR PRK PUR QAT RHO ROT ROU
3971 17213 1007774 194348 104528 129662 30672 2032 2039 604973
RSA RUS RWA SAA SAM SCG SEN SEY SGP SKN
192186 853054 6045 479 8042 57566 61441 17550 44651 6833
SLE SLO SMR SOL SOM SRB SRI SSD STP SUD
17881 182898 27715 2200 4752 72377 20864 520 2585 15999
SUI SUR SVK SWE SWZ SYR TAN TCH TGA THA
734529 9768 185297 967471 9434 13433 25276 481899 6455 113134
TJK TKM TLS TOG TPE TTO TUN TUR TUV UAE
11347 9182 1453 9153 178782 61587 94263 167905 663 22445
UAR UGA UKR UNK URS URU USA UZB VAN VEN
362 38369 425540 0 869189 60842 2622879 80615 5248 137748
VIE VIN VNM WIF YAR YEM YMD YUG ZAM ZIM
24110 4072 6299 3560 1674 4199 350 294766 22122 49929
Find the average height of the athletes in each sport. After finding these average heights, please sort your results.
Click to see solution
sort(tapply(myDF$HEIGHT, myDF$SPORT, mean, na.rm=TRUE), decreasing = TRUE)
Basketball Volleyball Beach Volleyball
190.8699 186.9948 186.1450
Water Polo Rowing Handball
184.8346 184.1722 183.3846
Baseball Tug-Of-War Bobsleigh
182.5993 182.4800 181.4375
Motorboating Ice Hockey Tennis
181.0000 178.9013 178.8981
Swimming Canoeing Jeu De Paume
178.5625 178.5396 178.5000
Sailing Modern Pentathlon Fencing
178.2622 177.9443 177.1642
Taekwondo Luge Nordic Combined
176.7508 176.6577 176.5045
Ski Jumping Athletics Skeleton
176.4028 176.2563 176.1886
Cycling Rugby Racquets
176.1088 176.0968 176.0000
Polo Football Rugby Sevens
175.5000 175.4022 175.3636
Art Competitions Equestrianism Golf
174.6441 174.3753 174.2901
Curling Judo Badminton
174.2031 174.1874 174.1788
Speed Skating Biathlon Lacrosse
174.0834 174.0348 174.0000
Triathlon Shooting Alpine Skiing
173.6458 173.5722 173.4891
Hockey Cross Country Skiing Archery
173.3597 173.2492 173.2031
Snowboarding Boxing Wrestling
173.0856 172.8257 172.3586
Table Tennis Freestyle Skiing Short Track Speed Skating
171.2538 171.0130 170.1082
Softball Synchronized Swimming Figure Skating
169.3951 168.4815 168.2022
Rhythmic Gymnastics Weightlifting Diving
167.8703 167.8248 166.6343
Trampolining Gymnastics
166.5828 162.9360
Which sport are the athletes the tallest (on average)? Does this make sense intuitively, i.e., is height an advantage in this sport?
Click to see solution
# just the tallest on average sport, sepperated for clarity
head(sort(tapply(myDF$Height, myDF$Sport, mean, na.rm=TRUE), decreasing = TRUE), n=1)
Basketball: 190.869878897191