Sets

Introduction

In Python, lists and sets are both mutable. The key difference is that sets are unordered and cannot have duplicate elements in a set. Sets are particularly useful for membership testing and eliminating said duplicates, even if lists are more useful overall.


Implementation

Sets can be declared using the set() function or the curly braces {}. There are important differences here that are explained easier with code than with words:

fruits = {'apple', 'banana', 'strawberry'}
type(fruits)

veggies = set('squash', 'zucchini', 'cauliflower')

structure = {}
type(structure)

empty = set()
type(empty)
<class 'set'>
`TypeError: set expected at most 1 argument, got 3`
<class 'dict'>
<class 'set'>

From this example, we see that filling curly braces with items results in a set, but using set() does not work — the function takes at most 1 argument. With this information, how to we declare an empty set? We see that using just { } results in an empty dictionary, given that the two data structures share notation, but set() with no parameters will give us an empty set.

Below is a list of set functions w3schools.com, some of which we discuss later.

Method Alternative Code Description

add()

Adds an element to a set

clear()

Removes all the elements from a set

copy()

Returns a copy of a set

difference(*other)

this - other

Returns a (new) list of elements in this set that are not in the other. Can list multiple arguments in the function or a - b - c for subtraction.

difference_update(*other)

In-place version of difference(). Removes the shared item(s) from the original list. Cannot be called and printed simultaneously.

discard() / remove()

Both functions remove the item from a set — remove() will raise an error if the item is not in the set, discard() will not.

intersection(*other)

this & other

Returns a set that includes the values that are in both sets.

intersection_update(*other)

In-place version of intersection(). Changes the original list to only include all the items shared between the set(s). Cannot be called and printed simultaneously.

isdisjoint(other)

Compares exactly two sets. Returns True if there are no shared elements between the two, False otherwise.

issubset(other)

Compares exactly two sets. Returns True if all the elements of this set are in the other set, False otherwise.

issuperset(other)

Compares exactly two sets. Returns True if all the elements of the other set are included in the current one, False otherwise.

pop()

Removes a random element from a set.

symmetric_difference(other)

Compares exactly two sets. Returns a set that includes the items that are not shared between sets.

symmetric_difference_update(other)

this ^ other

In-place version of symmetric_difference(). Updates the current set to include all values that aren’t shared between the sets.

union(*other)

this | other

Returns all the elements in all of the 2+ sets. Duplicates intrinsically disallowed.

update(*other)

other is any iterable(s) to be added to the original set. Cannot be called & printed simultaneously.

As you might be able to tell by the function names and alternative code, Python sets follow many of the same logical operations as mathematical sets.

The add() function is where we see our disorder appear. If you’re adding value(s) to a set and run your code multiple times, you might notice that the order of the set differs if you print the final set. The location of the item is determined by system memory, which changes frequently. You can check if sets match by using == as follows:

fruits = {"apple", "banana", "strawberry"}
fruits2 = {"apple", "strawberry", "banana"}
print(fruits == fruits2)
True

The drawback here is that sets cannot be indexed like lists, so if you need to evaluate elements of your group at the item level instead of the group level, lists will better fit your needs.

Sets support set comprehension, much like lists support list comprehension. This example is taken from the Python sets documentation.

magic = {x for x in 'abracadabra' if x not in 'abc'}
print(magic)
{'r', 'd'}

Wait, shouldn’t this return "{'r', 'd', 'r'}"? Not quite…​


Duplicate Removal

The main appeal of sets is its quick handling of duplicates. While a list’s unique elements can be found by wrapping it with the set() function or calling unique() on it, you can use a set to prevent duplicates from being added in the first place. Take the following example:

fish = {'salmon', 'tuna', 'cod'}
fish.add('cod')
print(fish)
{'salmon', 'cod', 'tuna'}

Recall that even if the order of the set changes, as long as == returns True, the sets are equivalent. This example demonstrates that sets handle duplicates on their own — no exception is thrown, even though "cod" is already in fish.

In our "abracadabra" example above, this is why "{'r', 'd'}" is the output — set comprehension, in that case, got the unique non-abc letters in "abracadabra", not every instance of them.

One application of this property is tracking the unique words in a document. If you’re parsing the file, you can add each word to a set and be left with every different word that appears in the document. If you need to access the independent elements for any reason, you can recast it using list().

Examples

How would I take the word "banana" out of a set if I did know it was included?

fruits = {'orange', 'grapefruit', 'banana'}
fruits.remove('banana')
print(fruits)
{'orange', 'grapefruit'}

Repeat the prior example, but what if we did not know the contents of the set?

fruits = {'orange', 'grapefruit', 'banana'}
fruits.discard('banana')
print(fruits)
{'orange', 'grapefruit'}

How do I determine if "Kings" is in the set "teams"?

teams = {'Kings', 'Lakers', 'Clippers', 'Suns', 'Warriors'}
'Kings' in teams
True

How do I find the union of multiple sets?

birds = {'blue jay', 'eagle', 'turkey'}
meats = {'fish', 'roast beef', 'turkey'}
seafood = {'fish', 'shellfish'}
print(birds | meats | seafood)
{'fish', 'roast beef', 'ham', 'eagle', 'turkey', 'blue jay', 'shellfish'}