STAT 29000: Project 7 — Spring 2021

Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! As you probably noticed in the previous project, matplotlib can be finicky — certain types of plots are really easy to create, while others are not. For example, you would think changing the color of a boxplot would be easy to do in matplotlib, perhaps we just need to add an option to the function call. As it turns out, this isn’t so straightforward (as illustrated at the end of [this section](#p-matplotlib-boxplot)). Occasionally this will happen and that is when packages like seaborn or plotnine (both are packages built using matplotlib) can be good. In this project we will explore this a little bit, and learn about some useful pandas functions to help shape your data in a format that any given package requires.

Context: In the next project, we will continue to learn about and become comfortable using matplotlib, seaborn, and plotnine.

Scope: python, visualizing data

Learning objectives
  • Demonstrate the ability to create basic graphs with default settings.

  • Demonstrate the ability to modify axes labels and titles.

  • Demonstrate the ability to customize a plot (color, shape/linetype).

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset

The following questions will use the dataset found in Scholar:

/class/datamine/data/apple/health/watch_dump.xml

Questions

Question 1

In an earlier project we explored some XML data in the form of an Apple Watch data dump. Most health-related apps give you some sort of graph or set of graphs as an output. Use any package you want to parse the XML data. Record is a dense category in this dataset. Each Record has an attribute called creationDate. Create a barplot of the number of Records per day. Make sure your plot is polished, containing proper labels and good colors.

You could start by parsing out the required data into a pandas dataframe or series.

The groupby method is one of the most useful pandas methods. It allows you to quickly perform operations on groups of data.

Items to submit
  • Python code used to solve the problem.

  • Output from running your code (including the graphic).

Question 2

The plot in question 1 should look bimodal. Let’s focus only on the first apparent group of readings. Create a new dataframe containing only the readings for the time period from 9/1/2017 to 5/31/2019. How many Records are there in that time period?

Items to submit
  • Python code used to solve the problem.

  • Output from running your code.

Question 3

It is hard to discern weekly patterns (if any) based on the graphics created so far. For the period of time in question 2, create a labeled bar plot for the count of Record by day of the week. What (if any) discernable patterns are there? Make sure to include the labels provided below:

labels = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
Items to submit
  • Python code used to solve the problem.

  • Output from running your code (including the graphic).

Question 4

Create a pandas dataframe containing the following data from watch_dump.xml:

  • A column called bpm with the bpm (beats per minute) of the InstantaneousBeatsPerMinute.

  • A column called time with the time of each individual bpm reading in InstantaneousBeatsPerMinute.

  • A column called date with the date.

  • A column called dayofweek with the day of the week.

You may want to use pd.to_numeric to convert the bpm column to a numeric type.

This is one way to convert the numbers 0-6 to days of the week:

myDF['dayofweek'] = myDF['dayofweek'].map({0:"Mon", 1:"Tue", 2:"Wed", 3:"Thu", 4:"Fri", 5: "Sat", 6: "Sun"})
Items to submit
  • Python code used to solve the problem.

  • Output from running your code.

Question 5

Create a heatmap using seaborn, where the y-axis shows the day of the week ("Mon" - "Sun"), the x-axis shows the hour, and the values on the interior of the plot are the average bpm by hour by day of the week.

Items to submit
  • Python code used to solve the problem.

  • Output from running your code (including the graphic).