# STAT 29000: Project 14 — Spring 2022

Motivation: Rearranging data to and from "long" and "wide" formats sounds like a difficult task, however, `tidyverse` has a variety of function that make it easy.

Context: This is the last project for the course. This project has a focus on how data can change when grouped differently, and using the `pivot` functions.

Scope: R, tidyverse, ggplot

Learning Objectives
• Use mutate, pivot, unite, filter, and arrange to wrangle data and solve data-driven problems.

• Combine different data using joins (left_join, right_join, semi_join, anti_join), and bind_rows.

• Group data and calculate aggregated statistics using group_by, mutate, summarize, and transform functions.

• Demonstrate the ability to create basic graphs with default settings, in ggplot.

• Demonstrate the ability to modify axes labels and titles.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

## Dataset(s)

The following questions will use the following dataset(s):

• `/depot/datamine/data/death_records/DeathRecords.csv`

## Questions

### Question 1

Calculate the average age of death for each of the `MaritalStatus` values and create a `barplot` using `ggplot` and `geom_col`.

Items to submit
• Code used to solve this problem.

• Output from running the code.

### Question 2

Now, let’s further group our data by `Sex` to see how the patterns change (if at all). Create a side-by-side bar plot where `Sex` is shown for each of the 5 `MaritalStatus` values.

Items to submit
• Code used to solve this problem.

• Output from running the code.

### Question 3

In the previous question, before you piped the data into `ggplot` functions, you likely used `group_by` and `summarize`. Take, for example, the following.

``````dat %>%
group_by(MaritalStatus, Sex) %>%
summarize(age_of_death=mean(Age))``````
output
```MaritalStatus	Sex	age_of_death
<chr>	<chr>	<dbl>
D	F	70.34766
D	M	65.60564
M	F	69.81002
M	M	73.05787
S	F	56.83075
S	M	49.12891
U	F	80.80274
U	M	80.27476
W	F	85.69817
W	M	83.98783```

Is this data "long" or "wide"?

There are multiple ways we could make this data "wider". Let’s say, for example, we want to rearrange the data so that we have the `MaritalStatus` column, a `M` column, and `F` column. The `M` column contains the average age of death for males and the `F` column the same for females. While this may sound complicated to do, `pivot_wider` makes this very easy.

Use `pivot_wider` to rearrange the data as described.

Items to submit
• Code used to solve this problem.

• Output from running the code.

### Question 4

Create a ggplot plot for each month. Each plot should be a barplot with the `as.factor(DayOfWeekOfDeath)` on the x-axis and the count on the y-axis. The code below provides some structure to help get you started.

``````g <- list() # to save plots to
for (i in 1:12) {
g[[i]] <- dat %>%
filter(...) %>%
ggplot() +
geom_bar(...)
}

library(patchwork)
library(repr)

# change plot size to 12 by 12
options(repr.plot.width=12, repr.plot.height=12)

# use patchwork to display all plots in a grid
# https://cran.r-project.org/web/packages/patchwork/vignettes/patchwork.html``````
Items to submit
• Code used to solve this problem.

• Output from running the code.

### Question 5

Question 4 is a bit tedious. `tidyverse` provides a much more ergonomic way to create plots like this. Use `facet_wrap` to create the same plot.

 You do not need to use a loop to solve this problem anymore. In face, you only need to add 1 more line of code to this part. ``````dat %>% filter(....) %>% ggplot() + geom_bar(...) + # new stuff here``````

Are there any patterns in the data that you find interesting?

Items to submit
• Code used to solve this problem.

• Output from running the code.

### Question 6

It has been a fun year. We hope that you learned something new!

• Write down 3 (or more) of your least favorite topics and/or projects from this past year (for STAT 29000).