STAT 19000: Project 13 — Fall 2021

Motivation: It is always important to stay fresh and continue to hone and improve your skills. For example, games and events like https://adventofcode.com/ are a great way to keep thinking and learning. Plus, you can solve the puzzles with any language you want! It can be a fun way to learn a new programming language.

Proper Preparation Prevents Poor Performance.

— James Baker

In this project we will continue to wade through data, with a special focus on the apply suite of functions, building your own functions, and graphics.

Context: This is the last project of the semester! Many of you will have already finished your 10 projects, but for those who have not, this should be a fun and straightforward way to keep practicing.

Scope: r

Learning Objectives
  • Read and write basic (csv) data.

  • Explain and demonstrate: positional, named, and logical indexing.

  • Utilize apply functions in order to solve a data-driven problem.

  • Gain proficiency using split, merge, and subset.

  • Demonstrate the ability to create basic graphs with default settings.

  • Demonstrate the ability to modify axes labels and titles.

  • Incorporate legends using legend().

  • Demonstrate the ability to customize a plot (color, shape/linetype).

  • Convert strings to dates, and format dates using the lubridate package.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /anvil/projects/tdm/data/iowa_liquor_sales/iowa_liquor_sales_cleaner.txt

Questions

Question 1

Run the lines of code below from project (12) to read the data and format the year and month.

library(data.table)
library(lubridate)

liquor <- fread('/anvil/projects/tdm/data/iowa_liquor_sales/iowa_liquor_sales_cleaner.txt')
liquor$date <- mdy(liquor$Date)
liquor$year <- year(liquor$date)
liquor$month <- month(liquor$date)

Run the code below to get a better understanding of columns State Bottle Cost and the State Bottle Retail.

head(liquor[,c("State Bottle Cost", "State Bottle Retail")])
typeof(liquor$`State Bottle Cost`)
typeof(liquor$`State Bottle Retail`)

Create two new columns, cost and retail to be numeric versions of State Bottle Cost and the State Bottle Retail respectively.

Once you have those two new columns, create a column called profit that is the profit for each sale. Which sale had the highest profit?

There are many ways to solve the question. Relevant topics contains functions to use in some possible solutions.

Relevant topics: gsub, substr, nchar, as.numeric, which.max

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

  • The date, vendor name, number of bottles sold and profit for the sale with the highest profit.

Question 2

We want to provide useful information based on a Vendor Number to help in the decision making process.

Create a function called createDashboard that takes two arguments: a specific Vendor Number and the liquor data frame, and returns a plot with the average profit per year, corresponding to the profit for that Vendor Number.

Relevant topics: tapply, plot, mean

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

  • The results of running createDashboard(255, liquor).

Question 3

Modify your createDashboard function that uses the liquor data frame as the default value, if the user forgets to give the name of a data frame as input to the function.

We are going to start adding additional plots to your function. Run the code below first, before you run the code to build your plots. This will organize many plots in a single plot.

par(mfrow=c(1, 2))

Note that we are creating a dashboard in this question with 1 row and 2 columns.

Add a bar plot to your dashboard that shows the total volume sold using Bottle Volume (ml).

Make sure to add titles to your plots.

Relevant topics: table, barplot

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

  • The results of running createDashboard(255).

Question 4

Modify par(mfrow=c(1, 2)) argument to be par(mfrow=c(2, 2)) so we can fit 2 more plots in our dashboard.

Create a plot that shows the average number of bottles sold per month.

Optional: Modify the argument mar in par() to reduce the margins between the plots in our dashboard.

Relevant topics: tapply, plot, mean

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

  • The results of running createDashboard(255).

Question 5

Add a plot to complete our dashboard. Write 1-2 sentences explaining why you chose the plot in question.

Optional: Add, remove, and/or modify the dashboard to contain information you find relevant. Make sure to document why you are making the changes.

Relevant topics: tapply, plot, mean

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

  • The results of running createDashboard(255).

Question 6 (optional, 0 pts)

patchwork is a very cool R package that makes for a simple and intuitive way to combine many ggplot plots into a single graphic. See here for details.

Re-write your function createDashboard to use patchwork and ggplot.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 7 (optional, 0 pts)

Use your createDashboard function to compare 2 vendors. You can print the dashboard into a pdf using the code below.

pdf(file = "myFilename.pdf",   # The directory and name you want to save the file in
    width = 8, # The width of the plot in inches
    height = 8) # The height of the plot in inches

createDashboard(255)

dev.off()
Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.