TDM 10100: R Project 8 — 2024
Motivation: We will learn about how user-defined functions work in R.
Context: Although R has lots of built-in functions, we can design our own functions too!
Scope: We start with some basic functions, just one line functions, to demonstrate how powerful they are.
Dataset(s)
This project will use the following dataset(s):
-
/anvil/projects/tdm/data/death_records/DeathRecords.csv
-
/anvil/projects/tdm/data/beer/reviews_sample.csv
-
/anvil/projects/tdm/data/election/itcont1980.txt
-
/anvil/projects/tdm/data/flights/subset/1990.csv
-
/anvil/projects/tdm/data/olympics/athlete_events.csv
Example 1:
Finding the average weight of Olympic athletes in a given country.
avgweights <- function(x) {mean(myDF$Weight[myDF$NOC == x], na.rm = TRUE)}
Example 2:
Finding the percentages of school metro types in a given state.
myschoolpercentages <- function(x) {prop.table(table(myDF$"School Metro Type"[myDF$"School State" == x]))}
Example 3:
In the 1980 election data, finding the sum of the donations in a given state.
mystatesum <- function(x) {sum(myDF$TRANSACTION_AMT[myDF$STATE == x])}
Example 4:
Finding the average number of stars for a given author of reviews.
myauthoravgstars <- function(x) {mean(myDF$stars[myDF$author == x])}
Questions
As before, please use the |
Question 1 (2 pts)
Consider this user-defined function, which makes a table that shows the percentages of values in each category:
makeatable <- function(x) {prop.table(table(x, useNA="always"))}
If we do something like this, with a column from a data frame:
makeatable(myDF$mycolumn)
Then it is the same as running this:
prop.table(table(myDF$mycolumn, useNA="always"))
In other words, makeatable
is a user-defined function that makes a table, including all NA
values, and expresses the result as percentages. That is what the prop.table
does here.
Now consider the DeathRecords data set:
/anvil/projects/tdm/data/death_records/DeathRecords.csv
-
Try the function
makeatable
on theSex
column of the DeathRecords. -
Also try the function
makeatable
on theMaritalStatus
column of the DeathRecords.
-
Use the
makeatable
function to display table of values from theSex
column of the DeathRecords. -
Use the
makeatable
function to display table of values from theMaritalStatus
column of the DeathRecords.
Question 2 (2 pts)
Define a function called teenagecount
as follows:
teenagecount <- function(x) {length(x[(x >= 13) & (x <= 19) & (!is.na(x))])}
-
Try this function on the
Age
column of the DeathRecords. -
Also try this function on the
Age
column of the file/anvil/projects/tdm/data/olympics/athlete_events.csv
-
Display the number of teenagers in the DeathRecords data.
-
Display the number of teenagers in the Olympics Athlete Events data.
Question 3 (2 pts)
The nchar
function gives the number of characters in a string. The which.max
function finds the position of the maximum value. Define the function:
longesttest <- function(x) {x[which.max(nchar(x))]}
-
Use the function
longesttest
to find the longest review in thetext
column of the beer reviews data set/anvil/projects/tdm/data/beer/reviews_sample.csv
-
Also use the function
longesttest
to find the longest name in theNAME
column of the 1980 election data:
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")
-
Print the longest review in the
text
column of the beer reviews data set/anvil/projects/tdm/data/beer/reviews_sample.csv
-
Print the longest name in the
NAME
column of the 1980 election data.
Question 4 (2 pts)
-
Create your own function called
mostpopulardate
that finds the most popular date in a column of dates, as well as the number of times that date occurs. -
Test your function
mostpopulardate
on thedate
column of the beer reviews data/anvil/projects/tdm/data/beer/reviews_sample.csv
-
Also test your function
mostpopulardate
on theTRANSACTION_DT
column of the 1980 election data.
-
a. Define your function called
mostpopulardate
-
b. Use your function
mostpopulardate
to find the most populardate
in the beer reviews data/anvil/projects/tdm/data/beer/reviews_sample.csv
-
c. Also use your function
mostpopulardate
to find the most popular transaction date from the 1980 election data.
Question 5 (2 pts)
Define a function called myaveragedelay
that takes a 3-letter string (correspding to an airport code) and finds the average departure delays (after removing the NA values) from the DepDelay
column of the 1990 flight data /anvil/projects/tdm/data/flights/subset/1990.csv
for flights departing from that airport.
Try your function on the Indianapolis "IND" flights. In other words, myaveragedelay("IND")
should print 5.96977225672878 because the flights with Origin
airport "IND" have an average departure delay of 5.9 minutes.
Try your function on the New York City "JFK" flights. In other words, myaveragedelay("JFK")
should print 11.8572741063607 because the flights with Origin
airport "JFK" have an average departure delay of 11.8 minutes.
-
a. Define your function called
myaveragedelay
-
b. Use
myaveragedelay("IND")
to print the average departure delays for flights with Origin airport "IND". -
c. Use
myaveragedelay("JFK")
to print the average departure delays for flights with Origin airport "JFK".
Submitting your Work
Now you know how to write your own functions! Please let us know if you need assistance with this project.
-
firstname_lastname_project8.ipynb
You must double check your You will not receive full credit if your |