# TDM 10100: Project 12 — 20223

Motivation: In the previous project we manipulated dates, this project we are going to continue to work with dates. Working with dates in `R` can require more attention than working with other object classes. These packages will help simplify some of the common tasks related to date data.

Dates and times can be complicated. For instance, not every year has 365 days. Dates are difficult because they have to accommodate for the Earth’s rotation and orbit around the sun. We need to handle timezones, daylight savings, etc. If suffices to say that, when focusing on dates and date-times in R, the simpler the better.

Learning Objectives
• Read and write basic (csv) data.

• Explain and demonstrate: positional, named, and logical indexing.

• Utilize apply functions in order to solve a data-driven problem.

• Gain proficiency using split, merge, and subset.

• Demonstrate the ability to create basic graphs with default settings.

• Demonstrate the ability to modify axes labels and titles.

• Incorporate legends using legend().

• Demonstrate the ability to customize a plot (color, shape/linetype).

• Work with dates in a variety of ways.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

## Dataset(s)

The project will use the following dataset:

• `/anvil/projects/tdm/data/restaurant/orders.csv`

## Questions

Go ahead and use the `fread` function from the `data.table` library, to read in the dataset to a data frame called `orders`.

### Question 1 (2 pts)

1. Use the `substr` function to get (only) the month-and-year of each date in the `created_at` column. How many times does each month-and-year pair occur? You may find more information about the `substr` function here: R substring

2. Now (instead) use the `month` function and the `year` function on the `created_at` column, and make sure that your results agree with the results from 1a.

3. Finally, use the `format` function to extract the month-and-year pairs from the `created_at` column, and make sure that your results (again!) agree with the results from 1a.

### Question 2 (2 pts)

1. Which `customer_id` placed the largest number of orders altogether? (Each row of the data set represents exactly one order.)

2. For the `customer_id` that you found in question 2a, either use the `subset` function or use indexing to find the month-and-year pair in which that customer placed the most orders.

### Question 3 (2 pts)

1. There are 5 types of payments in the `payment_mode` column. How many times are each of these 5 types of payments used in the data set?

2. If we focus on the `customer_id` found in question 2a, which type of payment does that customer prefer? How many times did that customer use each of the 5 types of payments?

### Question 4 (2 pts)

1. Use the `subset` function to make a data frame called `ordersJan2020` that contains only the orders from January 2020.

2. Create a plot using the `ordersJan2020` data that shows the sum of the `grand_total` values for each of the 7 days of the week.

Project 12 Assignment Checklist

• Jupyter Lab notebook with your code, comments and output for the assignment

• `firstname-lastname-project12.ipynb`

• R code and comments for the assignment

• `firstname-lastname-project12.R`.