TDM 10200: Project 4 — 2023

Motivation: In the last project, we spent time using if statements and for loops, today we are going to take a step back and learn more about loops. There are three main types of loops. for loops, while loops, and nested loops. We will also talk about tuples and lists. We will also learn about one of the most useful data structures in Python, a dictionary commonly referred to as dict.

Context: We will continue to introduce some basic data types and go thru some similar control flow concepts like we did in R.

Scope: tuples, lists, loops, dict

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

/anvil/projects/tdm/data/craigslist/vehicles.csv

Questions

read in the dataset and name it cars

Helpful Hint
import pandas as pd
cars = pd.read_csv("/anvil/projects/tdm/data/craigslist/vehicles.csv")

ONE

A dict contains a collection of key value pairs

NFL_team = dict([
            ('Indiana', 'Colts'),
            ('Kansas City', 'Chiefs'),
            ('Philadelphia', 'Eagles'),
            ('Minnesota', 'Vikings'),
            ('New England', 'Patriots'),
            ('Miami', 'Dolphins')
])
print(NFL_team)
#output of code
{'Indiana': 'Colts', 'Kansas City': 'Chiefs', 'Philadelphia': 'Eagles', 'Minnesota': 'Vikings', 'New England': 'Patriots', 'Miami': 'Dolphins'}

There are two primary ways to retrieve information from a dict.

  • mydict.get()

  • mydict[]

  1. Create a dictionary of MLB teams for the American League, call it MLB_teams

  2. Now add the MLB teams from the National League to the current dict.

  3. Delete all the teams that are South of Tennessee and North Carolina. (Mississippi, Alabama, Georgia, South Carolina, and Florida.)

Items to submit
  • Code used to answer the question.

  • Result of code.

TWO

Loops are important in any programming language, because they help to execute code repeatedly.

A while loop executes a block of statements repeatedly, until a given condition is satisfied.

It looks a bit like this:

count = 0
while (count < 15):
    count = count + 2
    print ("Yay!")

You can also pair an else statement with a while loop. The else statement will ONLY be executed when the while condition is false.

while condition
    # executes specific statments
else:
    # execute specific statments

We can add an else statement that will print "Boo!" when the condition count < 15 fails to be true (at the end of the loop)

count = 0
while (count < 15):
    count = count + 2
    print ("Yay!")
else:
    print ("Boo!")
  1. Use a while loop to print a series of numbers from 0 to 200, counting by 10’s

  2. Put the phrase "Old McDonald had a farm e-i-e-i-o" into a string and call it words. Print everything in the string EXCEPT the letter a

  3. Now take words and replace each occurrence of the symbol - with an asterisk *

Items to submit
  • Code used to answer the question.

  • Result of code.

THREE

A for loops is typically used for going thru a list, array, or a string. Typically it runs a specific code over and over again, for a defined number of times in a sequence. A while loop runs until it hits a certain condition, but a for loop iterates over items within a sequence or list.

for itarator_variable in sequence_name:
    statements
    ...
    statements
Insider information

-The first word of the statement is for which identifies that it is the beginning of the for loop.
- The iterator variable is a variable that changes each time the loop is executed.
- The keyword in shows the iterator variable which elements to loop over in a sequence.
- The statements allow you to preform various functions

Helpful Hint
  • enumerate() The function enumerate() allows us to iterate thru a sequence but it keeps track of the index and element. It can also be converted into a list of tuples using the list() function.

#create list of fruit
fruit = ['cherry', 'banana', 'orange', 'kiwi', 'apple']
#enumerate fruit but start at number one since default is 0
num_fruit = enumerate(fruit, start=1)
#print the enumerate object as a list
print (list(num_fruit))
#output from code
[(1, 'cherry'), (2, 'banana'), (3, 'orange'), (4, 'kiwi'), (5, 'apple')]
  • range() The function is built into python that allows for iteration through a sequence of numbers. range() will never include the stop number in its result (aka 6) and always includes 0

range(6)
for n in range(6):
    print(n)
#output from code
0
1
2
3
4
5
  1. Create a for loop

  2. Now add in the enumerate() function to your for loop.

  3. Create a 'for' loop with the range() function

Check out the Helpful Hint for an examples

Insider Knowledge

Notice that the indexing for our dataframe starts at 0. In Python and other programming languages, the indexing starts at 0. In contrast, during our previous semester, working in R, the indexing began at 1. This is an important fact to remember.

Items to submit
  • Code used to answer the question.

  • Result of code.

FOUR

From the dataset cars create a dict called mydict that contains key:value pairs. The keys should be the years and the values are single integers representing the number of vehicles from that year.

Helpful Hint
myyears = cars['year'].dropna().to_list()
# get a list containing each unique year
unique_years = list(set(myyears))
# for each year (key), initialize the value (value) to 0
mydict = {}
for year in unique_years:
    mydict[year] = 0

From the new dictionary that you created, find the number of cars, during each of these years:

  1. 2011

  2. 1989

  3. 1997

Items to submit
  • Code used to answer the question

  • Result of the code

FIVE

Now that we have a bit of familiarity with the data, let’s revisit another common Python package, called 'matplotlib' Let’s create some graphics using this package.

  1. Create a bar graph that has years on x-axis and number of vehicles on the y-axis

  2. Create a graph of something that you find interesting about the data.

Helpful Hint
import matplotlib.pyplot as plt
Items to submit
  • Code used to answer the question

  • Result of the code

TA applications for The Data Mine are currently being accepted. Please visit us here to apply!

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.