Apply Functions
tapply
The documentation definition for tapply
is a bit more specific than the others, where the arguments are now (X, INDEX, FUN)
, with X
being an object where the split
function applies, INDEX
is a factor by which X
is grouped, and FUN
is function as before.
To simplify this definition, we can say tapply
applies FUN
to X
when X
is grouped by INDEX
.
Examples
In athlete_events.csv, find the average height of the athletes in each country (the country is the NOC column).
Click to see solution
# read in data
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/olympics/athlete_events.csv")
tapply(myDF$Height, myDF$NOC, sum, na.rm=TRUE)
AFG AHO ALB ALG AND ANG ANT ANZ ARG ARM 9212 9042 9861 85255 23450 43660 20139 4595 403895 35935 ARU ASA AUS AUT AZE BAH BAN BAR BDI BEL 7102 3689 1191954 575742 47670 62814 7977 33907 6560 316177 BEN BER BHU BIH BIZ BLR BOH BOL BOT BRA 10421 24482 4829 20007 12779 297630 1044 19699 14768 597749 BRN BRU BUL BUR CAF CAM CAN CAY CGO CHA 18383 1495 523332 6512 8426 9503 1422227 13104 14316 6003 CHI CHN CIV CMR COD COK COL COM CPV CRC 94143 843337 28050 45145 11749 6339 166347 2563 3166 36604 CRO CRT CUB CYP CZE DEN DJI DMA DOM ECU 153317 0 386897 33745 333355 323637 5376 3148 36510 44326 EGY ERI ESA ESP EST ETH EUN FIJ FIN FRA 155129 6288 29334 766507 144338 59921 127136 36726 779838 1449532 FRG FSM GAB GAM GBR GBS GDR GEO GEQ GER 581244 4031 9947 6822 1395588 3390 459949 49878 3916 1315364 GHA GRE GRN GUA GUI GUM GUY HAI HKG HON 52686 339424 6306 56308 10719 14480 14656 12523 107634 24123 HUN INA IND IOA IRI IRL IRQ ISL ISR ISV 831216 56946 152930 12038 119247 185445 28344 79023 102388 42704 ITA IVB JAM JOR JPN KAZ KEN KGZ KIR KOR 1410280 6788 143225 11950 1271132 237756 120731 35615 1836 668190 KOS KSA KUW LAO LAT LBA LBR LCA LES LIB 1388 32378 38739 8488 140482 9771 9375 4722 8530 38660 LIE LTU LUX MAD MAL MAR MAS MAW MDA MDV 48378 109806 73722 18868 2007 102420 78937 13276 39076 6878 MEX MGL MHL MKD MLI MLT MNE MON MOZ MRI 418606 82368 2191 12767 11017 14613 17255 24324 9840 19352 MTN MYA NAM NBO NCA NED NEP NFL NGR NIG 4861 15614 12273 0 13743 679545 13932 170 130864 4310 NOR NRU NZL OMA PAK PAN PAR PER PHI PLE 593994 1839 368114 10858 63825 20454 17835 70980 89561 3054 PLW PNG POL POR PRK PUR QAT RHO ROT ROU 3971 17213 1007774 194348 104528 129662 30672 2032 2039 604973 RSA RUS RWA SAA SAM SCG SEN SEY SGP SKN 192186 853054 6045 479 8042 57566 61441 17550 44651 6833 SLE SLO SMR SOL SOM SRB SRI SSD STP SUD 17881 182898 27715 2200 4752 72377 20864 520 2585 15999 SUI SUR SVK SWE SWZ SYR TAN TCH TGA THA 734529 9768 185297 967471 9434 13433 25276 481899 6455 113134 TJK TKM TLS TOG TPE TTO TUN TUR TUV UAE 11347 9182 1453 9153 178782 61587 94263 167905 663 22445 UAR UGA UKR UNK URS URU USA UZB VAN VEN 362 38369 425540 0 869189 60842 2622879 80615 5248 137748 VIE VIN VNM WIF YAR YEM YMD YUG ZAM ZIM 24110 4072 6299 3560 1674 4199 350 294766 22122 49929
Find the average height of the athletes in each sport. After finding these average heights, please sort your results.
Click to see solution
sort(tapply(myDF$HEIGHT, myDF$SPORT, mean, na.rm=TRUE), decreasing = TRUE)
Basketball Volleyball Beach Volleyball 190.8699 186.9948 186.1450 Water Polo Rowing Handball 184.8346 184.1722 183.3846 Baseball Tug-Of-War Bobsleigh 182.5993 182.4800 181.4375 Motorboating Ice Hockey Tennis 181.0000 178.9013 178.8981 Swimming Canoeing Jeu De Paume 178.5625 178.5396 178.5000 Sailing Modern Pentathlon Fencing 178.2622 177.9443 177.1642 Taekwondo Luge Nordic Combined 176.7508 176.6577 176.5045 Ski Jumping Athletics Skeleton 176.4028 176.2563 176.1886 Cycling Rugby Racquets 176.1088 176.0968 176.0000 Polo Football Rugby Sevens 175.5000 175.4022 175.3636 Art Competitions Equestrianism Golf 174.6441 174.3753 174.2901 Curling Judo Badminton 174.2031 174.1874 174.1788 Speed Skating Biathlon Lacrosse 174.0834 174.0348 174.0000 Triathlon Shooting Alpine Skiing 173.6458 173.5722 173.4891 Hockey Cross Country Skiing Archery 173.3597 173.2492 173.2031 Snowboarding Boxing Wrestling 173.0856 172.8257 172.3586 Table Tennis Freestyle Skiing Short Track Speed Skating 171.2538 171.0130 170.1082 Softball Synchronized Swimming Figure Skating 169.3951 168.4815 168.2022 Rhythmic Gymnastics Weightlifting Diving 167.8703 167.8248 166.6343 Trampolining Gymnastics 166.5828 162.9360
Which sport are the athletes the tallest (on average)? Does this make sense intuitively, i.e., is height an advantage in this sport?
Click to see solution
# just the tallest on average sport, sepperated for clarity
head(sort(tapply(myDF$Height, myDF$Sport, mean, na.rm=TRUE), decreasing = TRUE), n=1)
Basketball: 190.869878897191