TDM 10100: Project 12 — 2022

In the previous project we manipulated dates, this project we are going to take it a bit further and use Tidyverse, more specifically the Luberdate package. Working with dates in R can require more attention than working with other object classes. These packages will help simplify some of the common tasks related to date data.

Dates and times can be complicated, not every year has 365 days, not every day has 24 hours, and not every minute has 60 seconds. Dates are difficult because they have to accommodate for the Earth’s rotation and orbit around the sun as well as the occurrence of timezones, daylight savings etc. Suffice to say that when focusing on dates and date-times in R the simpler the better. Lubridate helps do so.

Learning Objectives
  • Read and write basic (csv) data.

  • Explain and demonstrate: positional, named, and logical indexing.

  • Utilize apply functions in order to solve a data-driven problem.

  • Gain proficiency using split, merge, and subset.

  • Demonstrate the ability to create basic graphs with default settings.

  • Demonstrate the ability to modify axes labels and titles.

  • Incorporate legends using legend().

  • Demonstrate the ability to customize a plot (color, shape/linetype).

  • Convert strings to dates, and format dates using the lubridate package.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /anvil/projects/tdm/data/zillow/State_time_series.csv

Questions

First lets import the libraries

  • data.table

  • lubridate

library(data.table)  # make sure to load data.table first
library(lubridate)   # and then to load lubridate second; it will give you a warning in pink color but it is totally OK
# You need to load `data.table` first and `lubridate` second for this project, because they both define `wday` and we want the version from `lubridate` so we need to load it second!

We are going to continue to dig into the Zillow time series data.

ONE

  1. Go ahead and read in the dataset as states

  2. Find the class and the type of the column named Date

  3. Are there multiple functions that will return the same or similar information?

Insider Knowledge

Reminder:
- class shows the class of the specified object used as the arguments. The most common ones include but are not limited to: "numeric", "character", "logical", "date".
- typeof shows you the type or storage mode of objects. The most common ones include but are not limited to: "logical", "integer", "double", "complex", "character", "raw" and "list"

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

TWO

  1. In Project 11, we had to convert the Date column to a month, day, year format. Now convert the column Date into values from the class Date. (You can use lubridate to do so.) What do you think about the methods you have learned (so far) to convert dates?

  2. Create a new column in your data.frame states named day_of_the_week that shows (Sunday-Saturday).

  3. Lets create another column in the data.frame states that shows the days of the week as numbers.

county$Date <- as.Date(county$Date, format="%Y-%m-%d")
Helpful Hint

Take a look at the functions ymd, mdy, dym

Helpful Hint
  • Take a look at the functions month, year, day, wday.

  • The label argument is logical. It is also only available for wday() function. TRUE will display the day of the week as an ordered factor of character strings, such as "Sunday." FALSE will display the day of the week as a number.

  • The week_start argument by default the days are counted as 1 means Monday, 7 means Sunday When label = TRUE, this will be the first level of the returned factor. You can set lubridate.week.start option to control this parameter.

Insider Knowledge

Default values of class Date in R is displayed as YYYY-MM-DD

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

THREE

We want to see if there is a better month(s) for putting our house on the market?

  1. Use tapply to compare the average DaysOnZillow_AllHomes for all months.

  2. Make a barplot showing our results.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

FOUR

Find the information only for the year 2017 and call it states2017. Then create a lineplot that shows the average DaysOnZillow_AllHomes by Date using the states2017 data. What do you notice? When was the best month/months for posting a home for sale in 2017?

FIVE

Now we want to know if homes sell faster in different states? Lets look at Indiana, Maine, and Hawaii. Create a lineplot that uses DaysOnZillow_AllHomes by Date with one line per state. Use the states2017 dataset for this question. Make sure to have each state line colored differently and have a legend to identify which is which.

Helpful Hint

Use the lines() function to add lines to your plot
Use the ylim argument to show all lines
Use the col argument to identify and alter colors.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.