Introduction to Lists, Tuples, and Dictionaries

In this document we will:

  • Explore lists, tuples, and dictionaries, their properties, and how to use them.

  • Learn key methods to manipulate lists and access elements using indexing.

  • Use loops for repetitive tasks and create datasets dynamically.

  • Combine lists and dictionaries to build pandas DataFrames and enhance datasets.

Lists

Lists are one of the most commonly used data types in Python. Some of the additional functionality covered in the following sections on this page help to illustrate why lists are so commonly used. We emphasize that parentheses ( are associated with tuples, while square brackets [ are associated with lists. You can define them using square brackets [] or using the list() function.

my_list = [1,2,3,4,5.5, "some_text"]

Lists are mutable, meaning that they can be changed, after the list has initially been declared. This is notably different from tuples, which are immutable (i.e., cannot be changed).

my_list = [1,2,3,"i_can_be_muted"]
my_list[3] = "still_not_the_right_word"
print(my_list)
[1,2,3,'still_not_the_right_word']

Just like tuples, you can convert between the two types easily:

lex_listor = [1,2,3] # I am a list
super_tuple = tuple(lex_listor)
print(type(super_tuple))
<class 'tuple'>

If we wanted to extract certain elements we could use the [start:end] ability of lists.

lex_listor = [0]
1
lex_listor[0:1]
[1, 2]

To modify the list we could replace one of the lists elements.

lex_listor[0] = "yellow"
lex_listor
['yellow', 2, 3]

Tuples

Tuples are one of the primary data types utilized in Python. They are declared using parentheses ( and can contain any data type:

my_tuple = (1,2,3,4,5.5,"some_text")

In general, when you see or think "parentheses" in Python, you should think tuples. However, there is one notable exception. When utilizing tuple comprehensions, you would expect that the logic within the parentheses would create a tuple object. For example, in the code below, you would expect the output to be a tuple with the values 0, 2, 4, 6. However this type of syntax creates a generator:

not_a_tuple = (i*2 for i in range(4))
print(not_a_tuple)
<generator object <genexpr> at 0x7ff1b0188510>

In order to make sure that the output is a tuple, you would need to explicitly declare it:

now_a_tuple = tuple(i*2 for i in range(4))
print(now_a_tuple)
(0,2,4,6)

Another key feature of tuples is that they are immutable (not mutable). This means that, once they are declared, they cannot be modified. For example, if you try to change a single element of a tuple after it is created, then Python will give an error:

my_tuple = (1,2,3,"tuples are cool")
my_tuple[3] = "I prefer lists"
Python error :(

If you do need to change an item, it is pretty straightforward to convert from a tuple to a list:

i_want_to_be_a_list = (1,2,3) # Right now it is a tuple.
now_its_a_list = list(i_want_to_be_a_list)
print(type(now_its_a_list))
<class 'list'>

You can also create a tuple of tuples!

tuple_of_tuples = (3, 'hi', False), (1,2)
tuple_of_tuples
((3, 'hi', False), (1, 2))

if we wanted to extract an element of tuple, we can use square brackets with an index.

tuple_dif_types = (3, "hi", False)
tuple_dif_types[0]
3

Indexing

R and Python are similar in a lot of ways. One key distinction to understand is how they index values. Python is 0-indexed whereas R is 1-indexed. What does this mean? In practice, this means that the placement of items in R always starts with 1. In comparison, Python starts numbering with a 0. It is a little easier to understand in an example:

my_r_list <- c("first", "second", "third", "fourth")

In this case since we are looking at R code the list indexing is:

["first", "second", "third", "fourth"]
[   1,        2,        3,        4  ]

So if we wanted to access the "third" entry, we could run:

my_list[3]

However, in Python, the same list is 0-indexed. An example is included below, for comparison:

my_list = ["first", "second", "third", "fourth"]
["first", "second", "third", "fourth"]
[   0         1        2         3   ]

If we wanted to get the "third" entry in the list, we would use the code snippet below:

my_list[2]

Thankfully, some of the syntax is the same between Python and R. For example, the colon : continues to hold the same functionality between both languages (as long as you remember the index difference):

In R:

my_list[1:4]
"first", "second", "third", "fourth"

In Python, we emphasize that the last number in a range is not used. In other words, if we use 0:2 for a range of indices, then we only get entry 0 and 1 but not entry 2.

my_list[0:2]
"first", "second"

Python also supports a second : that indicates a "jump". For instance, in this case, we jump indices by 2 each time, in other words, we use every other index and skip the ones in between:

my_list[0:3:2]
"first", "third"

Sadly, Python and R do differ in other ways. One major difference is how the two languages handle negative indexes. In R, they remove a value at the given position:

my_list[c(-1,-2)]
"third", "fourth"

In Python, negative indexes just mean "start from the back of the list" instead of "start from the front". For example:

my_list[-1]
"fourth"

Negative indexes can be a little confusing in Python, because while positive indexes are 0-indexed, negative indexes are not (not sure what -0 is). This means that my_list[-4] is valid. In this case, it would return "first". However if you tried to print my_list[4], it would produce an IndexError, because the last list value is my_list[3]. Don’t worry if this is a bit confusing at first. It gets easier as you write more Python code and practice indexing.

List Methods

A method is a function for a particular object. When you hear or read method this is basically the same thing as a function. A list in this case is an example of an object that you can run methods on. In Python, the most common objects, like lists, dicts, tuples, sets, etc., all have extremely useful methods built right in!

The following is a table of list methods from w3schools.

Method Description

append()

Adds an element at the end of the list

clear()

Removes all the elements from the list

copy()

Returns a copy of the list

count()

Returns the number of elements with the specified value

extend()

Add the elements of a list (or any iterable), to the end of the current list

index()

Returns the index of the first element with the specified value

insert()

Adds an element at the specified position

pop()

Removes the element at the specified position

remove()

Removes the item with the specified value

reverse()

Reverses the order of the list

sort()

Sorts the list

We can demonstrate some of these methods in the examples below. Let’s start by creating a few lists that we can use:

list_one = ["first", "second", "third", "fourth", "fifth"]
list_two = ["sixth", "seventh", "eighth", "ninth"]

What if we wanted to add the string "tenth" to list_two?

list_two.append("tenth")
print(list_two)
["sixth", "seventh", "eighth", "ninth", "tenth"]

Ok, but what if we wanted to remove fourth from list_one and then add it back?

list_one.remove("fourth") # First we can remove it.
print(list_one)
list_one.append("fourth") # Then we can add it back.
print(list_one)
["first", "second", "third", "fifth"]
["first", "second", "third", "fifth", "fourth"]

Notice that adding fourth back to the list changes its index place. In this case it goes from an index of 3 in the original list to 4 in the new list.

What if we wanted to remove the first element and save it in a new variable?

new_variable = list_one.pop(0)
print(f'The new variable: {new_variable}')
print(f'The old list: {list_one}')
The new variable: first
The old list: ["second", "third", "fourth", "fifth"]

These are awesome, but what if I wanted to combine the two lists into one big list?

list_one.extend(list_two)
print(list_one)
['second', 'third', 'fifth', 'fourth', 'sixth', 'seventh', 'eighth', 'ninth', 'tenth']

Creating a Dataset Using Lists

Now let’s use lists to create our own dataset. Let’s create our own dataset related to a student’s class schedule. We will start by using lists to define the variables and observations. Then we will organize the lists using dictionaries to create a DataFrame. We will go further into dictionaries down below.

import pandas as pd

# Create the lists for each column
class_names = ["Statistics", "Philosophy", "History", "Engineering", "Art"]
instructors = ["Dr. Ward", "Ms. Johnson", "Mr. Lee", "Dr. Smith", "Mrs. Brown"]
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
start_times = ["9:00 AM", "10:30 AM", "12:00 PM", "1:30 PM", "3:00 PM"]
end_times = ["10:15 AM", "11:45 AM", "1:15 PM", "2:45 PM", "4:15 PM"]

# Creating the DataFrame
my_class_schedule = pd.DataFrame({
    "Class Name": class_names,
    "Instructor": instructors,
    "Day": days,
    "Start Time": start_times,
    "End Time": end_times
})


print(my_class_schedule)
    Class Name   Instructor        Day Start Time  End Time
0   Statistics     Dr. Ward     Monday    9:00 AM  10:15 AM
1   Philosophy  Ms. Johnson    Tuesday   10:30 AM  11:45 AM
2      History      Mr. Lee  Wednesday   12:00 PM   1:15 PM
3  Engineering    Dr. Smith   Thursday    1:30 PM   2:45 PM
4          Art   Mrs. Brown     Friday    3:00 PM   4:15 PM

Boom! We’ve created our own pandas dataframe using lists and dictonaries in the output above.

Now let’s change the index numbers 0-4 to class 1, class 2, class 3, class 4, and class 5.

my_class_schedule.index = ["Class 1", "Class 2", "Class 3", "Class 4", "Class 5"]
print(my_class_schedule)
        Class Name   Instructor        Day Start Time  End Time
Class 1   Statistics     Dr. Ward     Monday    9:00 AM  10:15 AM
Class 2   Philosophy  Ms. Johnson    Tuesday   10:30 AM  11:45 AM
Class 3      History      Mr. Lee  Wednesday   12:00 PM   1:15 PM
Class 4  Engineering    Dr. Smith   Thursday    1:30 PM   2:45 PM
Class 5          Art   Mrs. Brown     Friday    3:00 PM   4:15 PM

In Python, a DataFrame index is a label that identifies each row in the DataFrame. By default, when creating a DataFrame, the index is automatically assigned as a sequence of integers starting from 0 (0, 1, 2, 3, …​). However, we can customize these index labels to make the DataFrame easier to read and interpret.

Working with Lists and For Loops

Python includes various control flow mechanisms. A for loop, for instance, is used to iterate over elements in a collection (such as a list or tuple) or any iterable object. Please see our Control Flow document for more on loops.

Example

Let’s say we wanted to dynamically populate our lists and create a for loop that asks the user for the names. Here is an example you try for yourself:

import pandas as pd
# Create the lists
class_names = []
instructors = []

# Use a for loop for class names and instructors
for i in range(2):
    class_name = input(f"The name of my class {i + 1}: ")
    instructor = input(f"The instructor's name {class_name}: ")

    class_names.append(class_name)
    instructors.append(instructor)

# Create the DataFrame
for_loop_class_schedule = pd.DataFrame({
    "Class Name": class_names,
    "Instructor": instructors
})

print(for_loop_class_schedule)

Lists are:

  • Mutable: You can change their content after creation

  • Ordered: Items are stored in the order they were added.

  • Dynamic: You can add or remove items as needed.

Loops are:

  • Used to perform repetitive task.

  • We asked for an input and added the input to the list.

Key parts of the for loop:

  • Iterable: The range range(2) generates numbers from 0 to 1.

  • Body of the loop: The indented code block executes for each iteration.

Dictionaries (dicts)

Dictionaries are a crucial data type used to store key:value pairs. Dictionaries are hash maps/hash tables, which apply a hash function to the keys you insert, linking a location in code to your key. This makes dictionaries incredibly efficient and convenient for adding, removing, and searching for data at the expense of space.

The use for dictionaries (and hash maps in general) is that every key has a value attached to it. Keys must be unique, meaning you’ll have a unique identifier for every item you include, and its value adds supplementary information to that key.

Some examples where dictionaries would be used include:

  • Counting occurrences of a word (string: int)

  • Saving house addresses for a zip code (int: list of strings)

  • Tracking a list of people with certain titles (string: list of strings)

There will inevitably be a point where knowing how to use a dictionary will save you a lot of trouble.

You can access insert or set elements to a dictionary similarily as tuple or list.

dictionary_ex = {'x': 'pair value', 'green': [5, 9, 6]} #create the dictionary
dictionary_ex
{'x': 'my value', 'green': [5, 9, 6]}
dictionary_ex['x'] #access element
'my value'
dictionary_ex[1] = 'another pair value' #insert element
{'x': 'my value', 'green': [5, 9, 6], 1: 'another pair value'}

Using Dictionaries to Create a New Column in Our Dataset

Let’s add a new column to the dataset we previously created using lists, named my_class_schedule. After adding the new column, we’ll take a look at the updated dataset.

print(my_class_schedule)
    Class Name   Instructor        Day Start Time  End Time
0   Statistics     Dr. Ward     Monday    9:00 AM  10:15 AM
1   Philosophy  Ms. Johnson    Tuesday   10:30 AM  11:45 AM
2      History      Mr. Lee  Wednesday   12:00 PM   1:15 PM
3  Engineering    Dr. Smith   Thursday    1:30 PM   2:45 PM
4          Art   Mrs. Brown     Friday    3:00 PM   4:15 PM

Now, let’s use a dictionary and key-value pairs to add a new column to our dataset, categorizing classes as either morning or afternoon.

import pandas as pd

# mapping dictionary for Morning/Afternoon
time_mapping = {
    "9:00 AM": "Morning",
    "10:30 AM": "Morning",
    "12:00 PM": "Afternoon",
    "1:30 PM": "Afternoon",
    "3:00 PM": "Afternoon"
}

# Create a new column using the mapping
my_class_schedule["TimeOfDay"] = my_class_schedule["Start Time"].map(time_mapping)

# Display the updated DataFrame
print(my_class_schedule)
    Class Name   Instructor        Day Start Time  End Time  TimeOfDay
0   Statistics     Dr. Ward     Monday    9:00 AM  10:15 AM    Morning
1   Philosophy  Ms. Johnson    Tuesday   10:30 AM  11:45 AM    Morning
2      History      Mr. Lee  Wednesday   12:00 PM   1:15 PM  Afternoon
3  Engineering    Dr. Smith   Thursday    1:30 PM   2:45 PM  Afternoon
4          Art   Mrs. Brown     Friday    3:00 PM   4:15 PM  Afternoon

Using a dictionary and the map function can allow you to add a new column to the dataset, categorizing each class based on its start time. This is a common approach when you need add a new column based on predefined rules.