STAT 29000: Project 14 — Spring 2022

Motivation: Rearranging data to and from "long" and "wide" formats sounds like a difficult task, however, tidyverse has a variety of function that make it easy.

Context: This is the last project for the course. This project has a focus on how data can change when grouped differently, and using the pivot functions.

Scope: R, tidyverse, ggplot

Learning Objectives
  • Use mutate, pivot, unite, filter, and arrange to wrangle data and solve data-driven problems.

  • Combine different data using joins (left_join, right_join, semi_join, anti_join), and bind_rows.

  • Group data and calculate aggregated statistics using group_by, mutate, summarize, and transform functions.

  • Demonstrate the ability to create basic graphs with default settings, in ggplot.

  • Demonstrate the ability to modify axes labels and titles.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /depot/datamine/data/death_records/DeathRecords.csv

Questions

Question 1

Calculate the average age of death for each of the MaritalStatus values and create a barplot using ggplot and geom_col.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 2

Now, let’s further group our data by Sex to see how the patterns change (if at all). Create a side-by-side bar plot where Sex is shown for each of the 5 MaritalStatus values.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 3

In the previous question, before you piped the data into ggplot functions, you likely used group_by and summarize. Take, for example, the following.

dat %>%
    group_by(MaritalStatus, Sex) %>%
    summarize(age_of_death=mean(Age))
output
MaritalStatus	Sex	age_of_death
<chr>	<chr>	<dbl>
D	F	70.34766
D	M	65.60564
M	F	69.81002
M	M	73.05787
S	F	56.83075
S	M	49.12891
U	F	80.80274
U	M	80.27476
W	F	85.69817
W	M	83.98783

Is this data "long" or "wide"?

There are multiple ways we could make this data "wider". Let’s say, for example, we want to rearrange the data so that we have the MaritalStatus column, a M column, and F column. The M column contains the average age of death for males and the F column the same for females. While this may sound complicated to do, pivot_wider makes this very easy.

Use pivot_wider to rearrange the data as described.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 4

Create a ggplot plot for each month. Each plot should be a barplot with the as.factor(DayOfWeekOfDeath) on the x-axis and the count on the y-axis. The code below provides some structure to help get you started.

g <- list() # to save plots to
for (i in 1:12) {
    g[[i]] <- dat %>%
        filter(...) %>%
        ggplot() +
        geom_bar(...)
}

library(patchwork)
library(repr)

# change plot size to 12 by 12
options(repr.plot.width=12, repr.plot.height=12)

# use patchwork to display all plots in a grid
# https://cran.r-project.org/web/packages/patchwork/vignettes/patchwork.html
Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 5

Question 4 is a bit tedious. tidyverse provides a much more ergonomic way to create plots like this. Use facet_wrap to create the same plot.

You do not need to use a loop to solve this problem anymore. In face, you only need to add 1 more line of code to this part.

dat %>%
    filter(....) %>%
    ggplot() +
    geom_bar(...) +
    # new stuff here

Are there any patterns in the data that you find interesting?

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 6

It has been a fun year. We hope that you learned something new!

  • Write down 3 (or more) of your least favorite topics and/or projects from this past year (for STAT 29000).

  • Write down 3 (or more) of your favorite projects and/or topics you wish you were able to learn more about.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connect ion, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.