STAT 29000: Project 14 — Spring 2022

Motivation: Rearranging data to and from "long" and "wide" formats sounds like a difficult task, however, `tidyverse` has a variety of function that make it easy.

Context: This is the last project for the course. This project has a focus on how data can change when grouped differently, and using the `pivot` functions.

Scope: R, tidyverse, ggplot

Learning Objectives
• Use mutate, pivot, unite, filter, and arrange to wrangle data and solve data-driven problems.

• Combine different data using joins (left_join, right_join, semi_join, anti_join), and bind_rows.

• Group data and calculate aggregated statistics using group_by, mutate, summarize, and transform functions.

• Demonstrate the ability to create basic graphs with default settings, in ggplot.

• Demonstrate the ability to modify axes labels and titles.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

• `/depot/datamine/data/death_records/DeathRecords.csv`

Questions

Question 1

Calculate the average age of death for each of the `MaritalStatus` values and create a `barplot` using `ggplot` and `geom_col`.

Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 2

Now, let’s further group our data by `Sex` to see how the patterns change (if at all). Create a side-by-side bar plot where `Sex` is shown for each of the 5 `MaritalStatus` values.

Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 3

In the previous question, before you piped the data into `ggplot` functions, you likely used `group_by` and `summarize`. Take, for example, the following.

``````dat %>%
group_by(MaritalStatus, Sex) %>%
summarize(age_of_death=mean(Age))``````
output
```MaritalStatus	Sex	age_of_death
<chr>	<chr>	<dbl>
D	F	70.34766
D	M	65.60564
M	F	69.81002
M	M	73.05787
S	F	56.83075
S	M	49.12891
U	F	80.80274
U	M	80.27476
W	F	85.69817
W	M	83.98783```

Is this data "long" or "wide"?

There are multiple ways we could make this data "wider". Let’s say, for example, we want to rearrange the data so that we have the `MaritalStatus` column, a `M` column, and `F` column. The `M` column contains the average age of death for males and the `F` column the same for females. While this may sound complicated to do, `pivot_wider` makes this very easy.

Use `pivot_wider` to rearrange the data as described.

Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 4

Create a ggplot plot for each month. Each plot should be a barplot with the `as.factor(DayOfWeekOfDeath)` on the x-axis and the count on the y-axis. The code below provides some structure to help get you started.

``````g <- list() # to save plots to
for (i in 1:12) {
g[[i]] <- dat %>%
filter(...) %>%
ggplot() +
geom_bar(...)
}

library(patchwork)
library(repr)

# change plot size to 12 by 12
options(repr.plot.width=12, repr.plot.height=12)

# use patchwork to display all plots in a grid
# https://cran.r-project.org/web/packages/patchwork/vignettes/patchwork.html``````
Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 5

Question 4 is a bit tedious. `tidyverse` provides a much more ergonomic way to create plots like this. Use `facet_wrap` to create the same plot.

 You do not need to use a loop to solve this problem anymore. In face, you only need to add 1 more line of code to this part. ``````dat %>% filter(....) %>% ggplot() + geom_bar(...) + # new stuff here``````

Are there any patterns in the data that you find interesting?

Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 6

It has been a fun year. We hope that you learned something new!

• Write down 3 (or more) of your least favorite topics and/or projects from this past year (for STAT 29000).