# STAT 19000: Project 12 — Fall 2020

Motivation: In the previous project you were forced to do a little bit of date manipulation. Dates can be very difficult to work with, regardless of the language you are using. `lubridate` is a package within the famous tidyverse, that greatly simplifies some of the most common tasks one needs to perform with date data.

Context: We’ve been reviewing topics learned this semester. In this project we will continue solving data-driven problems, wrangling data, and creating graphics. We will introduce a tidyverse package that adds great stand-alone value when working with dates.

Scope: r

Learning objectives
• Read and write basic (csv) data.

• Explain and demonstrate: positional, named, and logical indexing.

• Utilize apply functions in order to solve a data-driven problem.

• Gain proficiency using split, merge, and subset.

• Demostrate the ability to create basic graphs with default settings.

• Demonstratre the ability to modify axes labels and titles.

• Incorporate legends using legend().

• Demonstrate the ability to customize a plot (color, shape/linetype).

• Convert strings to dates, and format dates using the lubridate package.

## Questions

### Question 1

Let’s continue our exploration of the Zillow time series data. A useful package for dealing with dates is called `lubridate`. This is part of the famous tidyverse suite of packages. Run the code below to load it. Read the `/class/datamine/data/zillow/State_time_series.csv` dataset into a data.frame named `states`. What class and type is the column `Date`?

``library(lubridate)``
Items to submit
• R code used to solve the question.

• `class` and `typeof` column `Date`.

### Question 2

Convert column `Date` to a corresponding date format using `lubridate`. Check that you correctly transformed it by checking its class like we did in question (1). Compare and contrast this method of conversion with the solution you came up with for question (3) in the previous project. Which method do you prefer?

 Take a look at the following functions from `lubridate`: `ymd`, `mdy`, `dym`.
 Here is a video about `ymd`, `mdy`, `dym`
Items to submit
• R code used to solve the question.

• `class` of modified column `Date`.

• 1-2 sentences stating which method you prefer (if any) and why.

### Question 3

Create 3 new columns in `states` called `year`, `month`, `day_of_week` (Sun-Sat) using `lubridate`. Get the frequency table for your newly created columns. Do we have the same amount of data for all years, for all months, and for all days of the week? We did something similar in question (3) in the previous project — specifically, we broke each date down by year. Which method do you prefer and why?

 Take a look at functions `month`, `year`, `day`, `wday`.
 You may find the argument of `label` in `wday` useful.
 Here is a video about `month`, `year`, `day`, `wday`
Items to submit
• R code used to solve the question.

• Frequency table for newly created columns.

• 1-2 sentences answering whether or not we have the same amount of data for all years, months, and days of the week.

• 1-2 sentences stating which method you prefer (if any) and why.

### Question 4

Is there a better month or set of months to put your house on the market? Use `tapply` to compare the average `DaysOnZillow_AllHomes` for all months. Make a barplot showing our results. Make sure your barplot includes "all of the fixings" (title, labeled axes, legend if necessary, etc. Make it look good.).

 If you want to have the month’s abbreviation in your plot, you may find both the `month.abb` object and the argument `names.arg` in `barplot` useful.
 This video might help with Question 4.
Items to submit
• R code used to solve the question.

• The barplot of the average `DaysOnZillow_AllHomes` for all months.

• 1-2 sentences answering the question "Is there a better time to put your house on the market?" based on your results.

### Question 5

Filter the `states` data to contain only years from 2010+ and call it `states2010plus`. Make a lineplot showing the average `DaysOnZillow_AllHomes` by `Date` using `states2010plus` data. Can you spot any trends? Write 1-2 sentences explaining what (if any) trends you see.

Items to submit
• R code used to solve the question.

• The time series lineplot for the average `DaysOnZillow_AllHomes` per date.

• 1-2 sentences commenting on the patterns found in the plot, and your impressions of it.

### Question 6

Do homes sell faster in certain states? For the following states: 'California', 'Indiana', 'NewYork' and 'Florida', make a lineplot for `DaysOnZillow_AllHomes` by `Date` with one line per state. Use the `states2010plus` dataset for this question. Make sure to have each state line colored differently, and to add a legend to your plot. Examine the plot and write 1-2 sentences about any observations you have.

 You may want to use the `lines` function to add the lines for different state.
 Make sure to fix the y-axis limits using the `ylim` argument in `plot` to properly show all four lines.
 You may find the argument `col` useful to change the color of your line.
 To make your legend fit, consider using the states abbreviation, and the arguments `ncol` and `cex` of the `legend` function.
Items to submit
• R code used to solve the question.

• The time series lineplot for `DaysOnZillow_AllHomes` per date for the 4 states.

• 1-2 sentences commenting on the patterns found in the plot, and your answer to the question "Do homes sell faster in certain states rather than others?".