TDM 10200: Project 10 — Spring 2023

Motivation: Being able to create accurate and telling visualizations is a skill that we can develop. Being able to analyze and create good visualizations is an invaluable tool.

Context: We are going to take a bit of a pause and work on learning some ways to create visualizations. Examine some plots, write about them, and then use your creative minds to create visualizations about the data.

Scope: python, visualizing data

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

/anvil/projects/tdm/data/beer/beers.csv

Insider Knowledge

Python has several packages that help with creating data visualizations. Listed below are some of the most popular packages, these include (but are not limited to) * Matplotlib: a 2-D plotting library * Works with NumPy arrays and allows for a large number of plots to help easier understand trends and make correlations. It is not ideal for time series data * Plotly: allows for the creation of easy to understand interactive plots. * Has 40 unique chart and plot types, but is not beginner friendly * GGplot: One of the more popular in the Python library It maps data and allows for attributes to be changed including color, shape, and even geometric objects. * Can store data in a dataframe, you can build informative visualizations because of the different ways you can represent the data. * Pygal: Allows the download of visualizations into different formats. Can be used to create an interactive experience. * It can become slow if it has too large of number of data points, but it allows users to still create wonderful visualizations even in complex problems. * Geoplotlib: Buildable maps and plot geographical data using this library. It is able to use large datasets. * Has the ability to create various maps, including dot maps, heat maps, area maps, and point density maps. * Plotnine: Based on R’s ggplot2 package, it supports the creation of complex plots from data in a dataframe. * Seaborn: Based on matplotlib. It can efficiently represent data that is stored in a table, array, list and other data structures.

ONE

Creating More Effective Graphs by Dr. Naomi Robbins and The Elements of Graphing Data by Dr. William Cleveland at Purdue University, are two excellent books about data visualization. Read the following excerpts from the books (respectively), and list 2 things you learned (or found interesting) from each book (i.e., 4 things you learned altogether).

Items to submit
  • Answers as to what two things you found interesting or learned from each book excerpt.

TWO

Data visualizations are an important part of data analysis. Visualizations can summarize large amounts of data in a picture. There are many ways to choose to represent the data. Sometimes the hardest part of the analysis process is finding which chart is best to represent your data.

Read in the dataset /anvil/projects/tdm/data/beer/beers.csv. Take a moment to look at the columns and the head and tail of the data. What is some information that you could use to create a data visualization?

Give 3 different types of information that you could show from this dataframe. Then tell which type of chart you would use and why.

Example: Show a country’s beer lineup and the average ratings for each beer.

Insider Knowledge

Common reasons you would use data visualizations: * showing change over time * showing part-to-whole composition * showing how data is distributed * comparing values between groups * observing relationships between variables * looking at geographical data "How to Choose the Right Data Visualization" by Mike Yi and Mel Restori

"Essential Chart Types for Data Visualization" by Mike Yi and Mary Sapountzis

Items to submit
  • 3 types of information you would show from this dataframe and which chart you would choose to represent it.

THREE

Using the pycountry library, take the country abbreviations and create a new column with the full country names. Rename the country column to country_code and the new column should be named country_name.

Find how many times each type of beer (in the style column) occurs in each country. Create a new dataframe with this information.

Make a visualization that compares beers made in eastern Europe.

Helpful Hint

Eastern European countries include Albania, Bosnia and Herzegovina, Bulgaria, Croatia, the Czech Republic, Estonia, Hungary, Kosovo, Latvia, Lithuania, the Republic of North Macedonia, Moldova, Montenegro, Poland, Romania, Serbia, Slovakia, Slovenia and Ukraine.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

FOUR

Take the 3 things that you listed in question TWO and go ahead and create the visualizations for each of them.

Items to submit
  • The three visualizations

  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.