TDM 10100: Project 10 — 2023

Motivation: As we have learned, functions are foundational to more complex programs and behaviors.
There is an entire programming paradigm based on functions called functional programming.

Context: We will apply functions to entire vectors of data using tapply and sapply. We learned how to create functions, and now the next step we will take is to use it on a series of data.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The project will use the following dataset(s):

  • /anvil/projects/tdm/data/restaurant/orders.csv

  • /anvil/projects/tdm/data/restaurant/vendors.csv

The read.csv() function automatically delineates by a comma`,`
You can use other delimiters by using adding the sep argument
i.e. read.csv(…​sep=';')

You can also load the data.table library and use the fread function.

Questions

Question 1 (2 pts)

Please load the datasets into data frames named orders and vendors

There are many websites that explain how to use grep and grepl (the l stands for logical) to search for patterns. See, for example: statisticsglobe.com/grep-grepl-r-function-example

  1. Use the grepl function and the subset function to make a new data frame from vendors, containing only the rows with "Fries" in the column called vendor_tag_name.

  2. Now use the grep function and row indexing, to make a data frame from vendors that (as before) contains only the rows with "Fries" in the column called vendor_tag_name.

  3. Verify that your data frames in questions 1a and 1b are the same size.

Question 2 (2 pts)

  1. In the data frame vendors, there are two types of delivery_charge values: 0 (which represented free delivery) and 0.7 (which represents non-free delivery). Make a table that shows how many of each type of value there are in the delivery_charge column.

  2. Please use the prop.table function to convert these counts into percentages.

Question 3 (2 pts)

  1. Consider only the vendors with vendor_category_id == 2. Among these vendors, find the percentages of the delivery_charge column that are 0 (free delivery) and 0.7 (non-free delivery).

  2. Now consider only the vendors with vendor_category_id == 3, and again find the percentages of the delivery_charge column that are 0 (free delivery) and 0.7 (non-free delivery).

Question 4 (1 pt)

  1. Solve questions 3a and 3b again, but this time, solve these two questions with one application of the tapply command, which provides the answers to both questions. (It is fine to give only the counts here, in question 4a, and convert the counts to percentages in question 4b.)

  2. Now (instead) use an user-defined function inside the tapply to convert your answer from counts into percentages.

Question 5 (1 pt)

  1. Starting with your solution to question 4a, now use the sapply command to convert your answer from counts into percentages. Your solution should agree with the percentages that you found in question 4b.

Project 10 Assignment Checklist

  • Jupyter Lab notebook with your code, comments and output for the assignment

    • firstname-lastname-project10.ipynb

  • R code and comments for the assignment

    • firstname-lastname-project10.R.

  • Submit files through Gradescope

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.