TDM 10100: Project 12 — 20223

Motivation: In the previous project we manipulated dates, this project we are going to continue to work with dates. Working with dates in R can require more attention than working with other object classes. These packages will help simplify some of the common tasks related to date data.

Dates and times can be complicated. For instance, not every year has 365 days. Dates are difficult because they have to accommodate for the Earth’s rotation and orbit around the sun. We need to handle timezones, daylight savings, etc. If suffices to say that, when focusing on dates and date-times in R, the simpler the better.

Learning Objectives
  • Read and write basic (csv) data.

  • Explain and demonstrate: positional, named, and logical indexing.

  • Utilize apply functions in order to solve a data-driven problem.

  • Gain proficiency using split, merge, and subset.

  • Demonstrate the ability to create basic graphs with default settings.

  • Demonstrate the ability to modify axes labels and titles.

  • Incorporate legends using legend().

  • Demonstrate the ability to customize a plot (color, shape/linetype).

  • Work with dates in a variety of ways.

Make sure to read about, and use the template found here, and the important information about projects submissions here.


The project will use the following dataset:

  • /anvil/projects/tdm/data/restaurant/orders.csv


Go ahead and use the fread function from the data.table library, to read in the dataset to a data frame called orders.

Question 1 (2 pts)

  1. Use the substr function to get (only) the month-and-year of each date in the created_at column. How many times does each month-and-year pair occur? You may find more information about the substr function here: R substring

  2. Now (instead) use the month function and the year function on the created_at column, and make sure that your results agree with the results from 1a.

  3. Finally, use the format function to extract the month-and-year pairs from the created_at column, and make sure that your results (again!) agree with the results from 1a.

Question 2 (2 pts)

  1. Which customer_id placed the largest number of orders altogether? (Each row of the data set represents exactly one order.)

  2. For the customer_id that you found in question 2a, either use the subset function or use indexing to find the month-and-year pair in which that customer placed the most orders.

Question 3 (2 pts)

  1. There are 5 types of payments in the payment_mode column. How many times are each of these 5 types of payments used in the data set?

  2. If we focus on the customer_id found in question 2a, which type of payment does that customer prefer? How many times did that customer use each of the 5 types of payments?

Question 4 (2 pts)

  1. Use the subset function to make a data frame called ordersJan2020 that contains only the orders from January 2020.

  2. Create a plot using the ordersJan2020 data that shows the sum of the grand_total values for each of the 7 days of the week.

Project 12 Assignment Checklist

  • Jupyter Lab notebook with your code, comments and output for the assignment

    • firstname-lastname-project12.ipynb

  • R code and comments for the assignment

    • firstname-lastname-project12.R.

  • Submit files through Gradescope

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.