TDM 10200: Project 2 — 2024

Motivation: Pandas will enable us to work with data in Python. If you were enrolled in The Data Mine in the fall semester, you will recognize some similarities to data frames from R. If you are new to The Data Mine, you will likely find that Pandas makes it easy to work with data. Matplotlib is a widely-used Python library for creating visualizations in Python.

Context: This is our second project and we will continue to introduce some basic data types, basic operations using pandas and matplotlib

Scope: tuples, lists, pandas, matplotlib

Learning Objectives
  • Familiar with python data types

  • Basic panda operations

  • Basic matplotlib operations

Dataset(s)

You will use the following dataset(s) for questions

  • /anvil/projects/tdm/data/craigslist/vehicles.csv

Readings and Resources

  • Make sure to read about, and use the template found here, and the important information about projects submissions here.

  • Please review the following Examples Book pages before you start the project, and be sure to try some of these examples! These will help you be prepared for the project questions below.

Questions

Question 1 (2 pts)

  1. Create a list called mydata that contains 6 tuples. Each tuple should have a student’s first name, age and major. (You may make up the students' information.)

  2. Use a DataFrame Constructor to convert mydata into a DataFrame named studentDF.

  3. Use "iloc[]" to extract and display the second student’s information in the DataFrame

You may get more information about "iloc[]" here

Question 2 (2 pts)

For question 2, when you run:

import pandas as pd
myDF = pd.read_csv("/anvil/projects/tdm/data/craigslist/vehicles.csv")

You need to use 3 cores in your Jupyter Lab session. If you started your Jupyter Lab session with only 1 core, just close your Jupyter Lab session and start a new session that uses 3 cores. Otherwise, your kernel will crash when you load the data.

  1. Read in the dataset /anvil/projects/tdm/data/craigslist/vehicles.csv into a pandas DataFrame called myDF. (Optional: If you want to, you can use the first column id as the DataFrame’s index, but this is not required.)

  2. Display the first and last five rows of the myDF DataFrame.

.head()
.tail()

Question 3 (2 pts)

  1. Display how many rows and columns there are in the entire DataFrame myDF.

  2. Display a list of all the column names in the DataFrame myDF.

You can revisit the functions given in Project 1, Question 5, to help with both parts of this question.

Question 4 (2 pts)

Use the data from myDF to answer the following questions:

  1. How many vehicles have a price that is strictly larger than $6000?

  2. How many vehicles are from Indiana? How many are from Texas?

  3. Display all of the regions listed in the data frame. You can use the unique() method on the region column of myDF. How many different regions appear altogether (counting each region just once)?

We added a video about counting the number of entries per state (This is a different data set than the vehicles data, but it should help guide you about how to solve Question 4, because we are still counting items per state, just using breweries instead of vehicles, but the method is the same.)

Question 5 (2 pts)

  1. Plot a bar chart that illustrates the number of vehicles in each state, whose price is strictly lower than $6000. The bar chart should show the number of each of these vehicles in each state.

We added a two part video about making such a bar chart. See the part 1 video and the part 2 video. Note: The example videos are about the number of reviews per user (instead of the number of vehicles per state), but the method is the same, and these videos should help to guide your work on Question 5.

Project 02 Assignment Checklist

  • Jupyter Lab notebook with your code, comments and output for the assignment

    • firstname-lastname-project02.ipynb.

  • Python file with code and comments for the assignment

    • firstname-lastname-project02.py

  • Submit files through Gradescope

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.