Writing Functions

In this example, we analyze the number of rides on the Disney ride dinosaur on the date 06/30/2017.

import pandas as pd

myridename = 'dinosaur'

mydate = '06/30/2017'

total_count = 0

for mydf in pd.read_csv(f'/anvil/projects/tdm/data/disney/{myridename}.csv', chunksize = 10000):
    for index, row in mydf.iterrows():
        if row['date'] == mydate:
            total_count += 1

total_count

I like to check each analysis at least 5 times, before I wrap the analysis into a function. It lets me get familiar with the data, and also allows me to fix any mistakes, and also to familiar with types arguments I will use in my function.

This is the function that I created, based on the work from the earlier examples.

import pandas as pd

def getdisneyrides(mydate: str, myridename: str) -> int:
    """
    getdisneyrides is a function that accepts mydate and myridename as arguments,
    and returns the number of rides that occur on mydate for myridename.

    Args:
        mydate (str): The date on which we are counting the number of rides.
        myridename (str): The ride on which we are counting the number of rides.

    Returns:
        The number of rides on myridename during mydate.
    """
    total_count = 0
    for mydf in pd.read_csv(f'/anvil/projects/tdm/data/disney/{myridename}.csv', chunksize = 10000):
        for index, row in mydf.iterrows():
            if row['date'] == mydate:
                total_count += 1
    return total_count

Now we can more easily analyze the number of rides on the Disney ride dinosaur on the date 06/30/2017.

getdisneyrides('06/30/2017', 'dinosaur')

We can study the number of rides on dinosaur for that entire week, from June 24, 2017 to June 30, 2017:

for i in range(24,31):
    print(getdisneyrides(f'06/{i}/2017', 'dinosaur'))

We can also tally the total number of rides for the week.

mytotalweek = 0
for i in range(24,31):
    mytotalweek += getdisneyrides(f'06/{i}/2017', 'dinosaur')
print(mytotalweek)

We can even build a dictionary with the number of rides on each day.

mydinosaurdictionary = {}
for i in range(24,31):
    mydinosaurdictionary.update([ (f'06/{i}/2017', getdisneyrides(f'06/{i}/2017', 'dinosaur')) ])
mydinosaurdictionary

and then we can lookup the number of rides on each day.

mydinosaurdictionary['06/24/2017']

In this example, we analyze the number of donations to federal election campaigns, during the 1984 election cycle, from donors in Illinois.

If you try this example about the election data, the files do not contain headers anymore, so you need to use header=None and you need to refer to www.fec.gov/campaign-finance-data/contributions-individuals-file-description/ to make column headers for this data.

import pandas as pd

myyear = 1984

mystate = 'IL'

total_count = 0

for mydf in pd.read_csv(f'/anvil/projects/tdm/data/election/itcont{myyear}.txt', chunksize = 10000, delimiter = '|'):
    for index, row in mydf.iterrows():
        if row['STATE'] == mystate:
            total_count += 1

total_count

Now I have checked my work for at least 5 pairs of years and states, and I am ready to wrap my work into a function!

If you try this example about the election data, the files do not contain headers anymore, so you need to use header=None and you need to refer to www.fec.gov/campaign-finance-data/contributions-individuals-file-description/ to make column headers for this data.

This is the function that I created, based on the work from the earlier examples.

import pandas as pd

def getdonations(myyear: int, mystate: str) -> int:
    """
    getdonations is a function that accepts myyear and mystate as arguments,
    and returns the number of donations from mystate during myyear.

    Args:
        myyear(int): The year in which we are counting the number of donations.
        mystate(str): The state in which we are counting the number of donations.

    Returns:
        int: The number of donations from mystate during myyear.
    """
    total_count = 0
    for mydf in pd.read_csv(f'/anvil/projects/tdm/data/election/itcont{myyear}.txt', chunksize = 10000, delimiter = '|'):
        for index, row in mydf.iterrows():
            if row['STATE'] == mystate:
                total_count += 1
    return total_count

This function allows us to study the number of donations from any state, during any (even-numbered) year.

getdonations(1984, 'IL')

getdonations(1984, 'OH')

getdonations(1984, 'IN')

getdonations(1982, 'IN')

getdonations(1980, 'IN')

We can now (more easily) find the number of donations from Indiana donors, during each of the election cycles from 1980, 1982, 1984, 1986, and 1988

myyear = 1980
while myyear < 1990:
    print(getdonations(myyear, 'IN'))
    myyear += 2
# This will print the number of donations in Indiana during the years 1980, 1982, 1984, 1986, 1988

We can also make a dictionary that keeps track of the donations above.

myindianadictionary = {}
myyear = 1980
while myyear < 1990:
    myindianadictionary.update([ (myyear, getdonations(myyear, 'IN')) ])
    myyear += 2
# This will create a dictionary with 5 key-value pairs,
# corresponding to the number of donations in Indiana during the years 1980, 1982, 1984, 1986, 1988

Here are the contents of that dictionary:

myindianadictionary

It is easy to query values from the dictionary of donations from Indiana donors, looking up the values for each year, as follows:

myindianadictionary[1980]

myindianadictionary[1982]

myindianadictionary[1984]

New Videos For Project 6

First we import Pandas.

import pandas as pd

Then we load the getdisneyrides functions, without any changes from our example last week.

def getdisneyrides(mydate: str, myridename: str) -> int:
    """
    getdisneyrides is a function that accepts mydate and myridename as arguments,
    and returns the number of rides that occur on mydate for myridename.

    Args:
        mydate (str): The date on which we are counting the number of rides.
        myridename (str): The ride on which we are counting the number of rides.

    Returns:
        The number of rides on myridename during mydate.
    """
    total_count = 0
    for mydf in pd.read_csv(f'/anvil/projects/tdm/data/disney/{myridename}.csv', chunksize = 10000):
        for index, row in mydf.iterrows():
            if row['date'] == mydate:
                total_count += 1
    return total_count

We remind ourselves how to use this function getdisneyrides

getdisneyrides('06/30/2017', 'dinosaur')

and another example, in which we print 3 lines, namely, the number of rides on each of three different dates, on the ride dinosaur.

for mydate in ['06/23/2017', '06/25/2017', '06/30/2017']:
    print(f'On the day', mydate, 'there were a total of', getdisneyrides(mydate, 'dinosaur'), 'rides on the dinosaur')

which has this output:

On the day 06/23/2017 there were a total of 157 rides on the dinosaur
On the day 06/25/2017 there were a total of 160 rides on the dinosaur
On the day 06/30/2017 there were a total of 158 rides on the dinosaur

and now we adjust this example, so that mydate is in braces in a formatted string:

for mydate in ['06/23/2017', '06/25/2017', '06/30/2017']:
    print(f'On the day {mydate} there were a total of', getdisneyrides(mydate, 'dinosaur'), 'rides on the dinosaur')

which has the same output:

On the day 06/23/2017 there were a total of 157 rides on the dinosaur
On the day 06/25/2017 there were a total of 160 rides on the dinosaur
On the day 06/30/2017 there were a total of 158 rides on the dinosaur

Now we use a double for loop, in which we compute the number of rides on each of 3 dates for each of 3 rides:

for mydate in ['06/23/2017', '06/25/2017', '06/30/2017']:
    for myride in ['dinosaur', '7_dwarfs_train', 'soarin']:
        print(f'On the day {mydate} there were a total of', getdisneyrides(mydate, myride), f'rides on the {myride}')

which has this output:

On the day 06/23/2017 there were a total of 157 rides on the dinosaur
On the day 06/23/2017 there were a total of 143 rides on the 7_dwarfs_train
On the day 06/23/2017 there were a total of 126 rides on the soarin
On the day 06/25/2017 there were a total of 160 rides on the dinosaur
On the day 06/25/2017 there were a total of 135 rides on the 7_dwarfs_train
On the day 06/25/2017 there were a total of 114 rides on the soarin
On the day 06/30/2017 there were a total of 158 rides on the dinosaur
On the day 06/30/2017 there were a total of 136 rides on the 7_dwarfs_train
On the day 06/30/2017 there were a total of 117 rides on the soarin

Now we are ready to wrap this work into a function called getdisneyreport:

def getdisneyreport(mylistofdates: list, mylistofrides: list):
    """
    getdisneyreport is a function that accepts mylistofdates and mylistofrides as arguments,
    and returns the number of rides that occur on each date in mylistofdates and each ride in mylistofrides.

    Args:
        mylistofdates (list): The dates on which we are counting the number of rides.
        mylistofrides (list): The rides on which we are counting the number of rides.

    Returns:
        Nothing.  Instead, we just print the values on each day for each ride.
    """
    for mydate in mylistofdates:
        for myride in mylistofrides:
            print(f'On the day {mydate} there were a total of', getdisneyrides(mydate, myride), f'rides on the {myride}')

and we use the function to print the same output as before:

getdisneyreport(['06/23/2017', '06/25/2017', '06/30/2017'], ['dinosaur', '7_dwarfs_train', 'soarin'])

which outputs the same values as above:

On the day 06/23/2017 there were a total of 157 rides on the dinosaur
On the day 06/23/2017 there were a total of 143 rides on the 7_dwarfs_train
On the day 06/23/2017 there were a total of 126 rides on the soarin
On the day 06/25/2017 there were a total of 160 rides on the dinosaur
On the day 06/25/2017 there were a total of 135 rides on the 7_dwarfs_train
On the day 06/25/2017 there were a total of 114 rides on the soarin
On the day 06/30/2017 there were a total of 158 rides on the dinosaur
On the day 06/30/2017 there were a total of 136 rides on the 7_dwarfs_train
On the day 06/30/2017 there were a total of 117 rides on the soarin

If you try this example about the election data, the files do not contain headers anymore, so you need to use header=None and you need to refer to www.fec.gov/campaign-finance-data/contributions-individuals-file-description/ to make column headers for this data.

First we import Pandas.

import pandas as pd

Then we load the getdonations functions, without any changes from our example last week.

def getdonations(myyear: int, mystate: str) -> int:
    """
    getdonations is a function that accepts myyear and mystate as arguments,
    and returns the number of donations from mystate during myyear.

    Args:
        myyear(int): The year in which we are counting the number of donations.
        mystate(str): The state in which we are counting the number of donations.

    Returns:
        int: The number of donations from mystate during myyear.
    """
    total_count = 0
    for mydf in pd.read_csv(f'/anvil/projects/tdm/data/election/itcont{myyear}.txt', chunksize = 10000, delimiter = '|'):
        for index, row in mydf.iterrows():
            if row['STATE'] == mystate:
                total_count += 1
    return total_count

# reminder to myself:  We only used the delimiter='|' because the data (as we saw last week) for the election donations
# has a pipe symbol, rather than a comma, in between the pieces of data in the source files.

We remind ourselves how to use this function getdonations

getdonations(1980, 'IN')

and another example, in which we print 6 lines, namely, the number of donations in each year in mylistofyears from each state in mylistofstates.

mylistofyears = [1980, 1982]
mylistofstates = ['IN', 'IL', 'OH']

for myyear in mylistofyears:
    for mystate in mylistofstates:
        print(f'The number of donations from {mystate} in the year {myyear} was', getdonations(myyear, mystate))

which has this output:

The number of donations from IN in the year 1980 was 4606
The number of donations from IL in the year 1980 was 15895
The number of donations from OH in the year 1980 was 10865
The number of donations from IN in the year 1982 was 2274
The number of donations from IL in the year 1982 was 5681
The number of donations from OH in the year 1982 was 4545

and now we wrap this work into a function called mydonationreport:

def mydonationreport(mylistofyears: list, mylistofstates: list):
    """
    mydonationreport is a function that accepts mylistofyears and mylistofstates as arguments,
    and returns the number of donations from each state in mylistofstates during each year in mylistofyears.

    Args:
        mylistofyears(list): The list of years in which we are counting the number of donations.
        mylistofstates(list): The list of states we are counting the number of donations.

    Returns:
        Nothing.  Instead, it outputs the values.
    """
    for myyear in mylistofyears:
        for mystate in mylistofstates:
            print(f'The number of donations from {mystate} in the year {myyear} was', getdonations(myyear, mystate))

and we use the function to print the same output as before:

mydonationreport([1980, 1982], ['IN', 'IL', 'OH'])

which outputs the same values as above:

The number of donations from IN in the year 1980 was 4606
The number of donations from IL in the year 1980 was 15895
The number of donations from OH in the year 1980 was 10865
The number of donations from IN in the year 1982 was 2274
The number of donations from IL in the year 1982 was 5681
The number of donations from OH in the year 1982 was 4545

Introduction

The core of functions is packing several actions into one defined unit. When we’re dealing with longer, complicated projects, writing Python functions is crucial for reasonable length and readability.


Function Signature & Annotations

Understanding the syntax and dialogue surrounding a function is an important step for both reading instructions about functions and communicating what your function does. Consider the following code:

def word_count(sentence: str) -> int:
    """
    word_count is a function that accepts a sentence as an argument,
    and returns the number of words in the sentence.

    Args:
        sentence (str): The sentence for which we are counting the words.

    Returns:
        int: The number of words in the sentence
    """
    result = len(sentence.split())
    return result

test_sentence = "this is a sentence that has eight words."
word_count(test_sentence)
8

There are a few things we need to define and clarify:

  • Function name: The name of the function immediately follows the def keyword. This function is called word_count and we will refer to functions by name in most cases.

  • Parameters: This is another term for the function’s input, of which there are 0+. There is one parameter in this function, called sentence.

    • In Python, you can include the data type after the parameter name. Above, this is : str to specify that sentence is a string. We recommend you specify because the methods you apply to the parameter might not work if the parameter is a different data type.

    • Output: This is another optional part of a function where you can specify what the function returns. In the example, this is represented by → int. Functions can have 0 or more outputs.

All of the above qualities define the signature of the function, and as you read, many of them are optional. We could write word_count in the following way and it would be the exact same:

def word_count(sentence):
    """
    word_count is a function that accepts a sentence as an argument,
    and returns the number of words in the sentence.

    Args:
        sentence (str): The sentence for which we are counting the words.

    Returns:
        int: The number of words in the sentence
    """
    result = len(sentence.split())
    return result

test_sentence = "this is a sentence that has eight words."
word_count(test_sentence)
8

The umbrella term function annotations includes all the optional parts of a function’s signature. Though optional, it’s recommended to include them in larger projects for clarity and to make your code look more "professional."


Arguments

When calling a function, arguments are not all the same. In Python, there are positional and keyword arguments. For example:

def add_x_multiply_by_y(value: int, x: int, y: int) -> int:
    return (value+x)*y

add_x_multiply_by_y(2, 3, 4)
20

Here, 2, 3, and 4 are positional arguments. The order in which the arguments are passed (their positions) determine to which parameter the argument belongs. If we were to rearrange the order in which we passed our values, it would change the result:

add_x_multiply_by_y(2, 4, 3)
18

Keyword arguments can be used to specify where the values are assigned, so you can control the variable values regardless of the order in which they come. We’ll use the function from before:

add_x_multiply_by_y(2, y=4, x=3)
20

Keywords allow for the output to match that of the first example even though the order is different. Unfortunately, this aspect of functions is not all-powerful — positional arguments must come before keyword arguments. Otherwise, you get an error with output that resembles Error: positional argument follows keyword argument (<string>, line X)


Docstrings

Docstrings are multi-line strings immediately following the function declaration that provide documentation. Conventionally, they describe what the function does in a style that is consistent between docstrings. If the function contains any arguments or return values, their purposes are defined and described.

We’ll put word_count from the top of the page here for convenience.

def word_count(sentence: str) -> int:
    """
    word_count is a function that accepts a sentence as an argument,
    and returns the number of words in the sentence.

    Args:
        sentence (str): The sentence for which we are counting the words.

    Returns:
        int: The number of words in the sentence
    """
    result = len(sentence.split())
    return result

test_sentence = "this is a sentence that has eight words."
word_count(test_sentence)

If you’re using a function written by someone else and want to access the docstring, you can use print or help as follows:

print(word_count.__doc__)
word_count is a function that accepts a sentence as an argument,
and returns the number of words in the sentence.

     Args:
         sentence (str): The sentence for which we are counting the words.

     Returns:
         int: The number of words in the sentence
help(word_count)
Help on function word_count in module __main__:

word_count(sentence: str) -> int
    word_count is a function that accepts a sentence as an argument,
    and returns the number of words in the sentence.

    Args:
       sentence (str): The sentence for which we are counting the words.

    Returns:
        int: The number of words in the sentence

Alternatively, if you’re coding in an IDE, you might have the ability to hover over the function call and view the docstring.

docstring-hover
Figure 1. Docstring Hovering

It’s good practice to write docstrings for every function, especially if you work with other programmers and they rely on the functions that you write.


Helper Functions

Functions can have helper functions nested within them, with the goal of reducing complexity or increasing clarity. For example, let’s say we wanted our function to strip all punctuation before counting the words:

import string

def word_count(sentence: str) -> int:
    """
    word_count is a function that accepts a sentence as an argument,
    and returns the number of words in the sentence.

    Args:
        sentence (str): The sentence for which we are counting the words.

    Returns:
        int: The number of words in the sentence
    """

    def _strip_punctuation(sentence: str):
        """
        helper function to strip punctuation.
        """
        return sentence.translate(str.maketrans('', '', string.punctuation))

    sentence_no_punc = _strip_punctuation(sentence)
    result = len(sentence_no_punc.split())
    return result

test_sentence = "this is a sentence - it has eight words."
word_count(test_sentence)
8

Here, our helper function is named _strip_punctuation. If you try to call helper functions outside of word_count, you will get an error, as it is defined within the scope of word_count and is not available outside that scope. In this example, word_count is the "caller" while _strip_punctuation is the "callee."

You can use your own syntax to clarify helper functions. Here, we use a preceding "_" to hint that the function is just for internal use.


In Python, functions can be passed to other functions as arguments. If a function accepts another function as an argument or returns function(s), we refer to them as higher-order functions. Some examples of higher-order functions in Python are map, filter, and reduce. If a function is used as an argument in another function, we refer to it as a callback function.


Packing & Unpacking

Say we have a function that returns a list of strings depending on how many matches are found within a paragraph. The output of this function would be n matching strings. If we wanted to apply a higher-order function, how many parameters do we code for higher-order function? The answer is n, but n will change depending on the callback function’s input. We address this with *args and **kwargs, two ways of accounting for variably-long parameters.

The formal way of saying *args is argument tuple packing. Here’s a few demonstrations:

def sum_then_multiply_by_x(x = 0, *args):
    print(args)
    return sum(args) * x

print(sum_then_multiply_by_x(2, 1, 2, 3))
(1, 2, 3)
12
print(sum_then_multiply_by_x(2, 1, 2, 3, 4))
(1, 2, 3, 4)
20
print(sum_then_multiply_by_x(2, 1, 2, 3, 4, 5))
(1, 2, 3, 4, 5)
30

Here, every argument passed after the x argument is packed into a tuple called *args. As you can see, you can pass any number of arguments and the function won’t break. Awesome!

Unpacking deals with expanding an n-sized tuple into a function with n arguments. Take the following example:

def print_boo_YAH(boo, yah):
    print(f'{boo}{yah.upper()}')

# normally we would call this function like so:
print_boo_YAH("first", "second")
firstSECOND
# but we can also call this function in this way:
words = ("boo", "yah")
print_boo_YAH(*words)
booYAH

Pay mind to the asterisk before the tuple parameter. Without it, tuple unpacking will not work.


Now that we have *args established, we can discuss **kwargs for dictionary packing and unpacking. The "kw" in **kwargs represents keyword, which takes the form x="something". We’ll explain keyword arguments in a bit. Take a look at this example:

def print_arguments(**kwargs):
    for key, value in kwargs.items():
        print(f'key: {key}, value: {value}')

print_arguments(arg1="pretty", arg2="princess")
print_arguments(arg1="pretty", arg2="pretty", arg3="princess")
key: arg1, value: pretty
key: arg2, value: princess

key: arg1, value: pretty
key: arg2, value: pretty
key: arg3, value: princess

For **kwargs, unpacking comes in the form of a dictionary instead of a tuple. Here’s an example:

def wild_animals(lions, tigers, bears):
    print(f'lions: {lions}')
    print(f'tigers: {tigers}')
    print(f'bears: {bears}')
    print('oh my!')

my_dict_to_unpack = {"lions":["bernice", "sandra", "arnold"],
                    "tigers":["janice"],
                    "bears":('paul', 'jim', 'dwight')}
wild_animals(**my_dict_to_unpack)
## lions: ['bernice', 'sandra', 'arnold']
## tigers: ['janice']
## bears: ('paul', 'jim', 'dwight')
## oh my!


Default Values & Exclusive Positional/Keyword Assignment

Arguments in Python can have default values, just like many other languages. This functionality is useful for situations where you don’t always use all of the available arguments — just assign the optional arguments to null or 0. We’ll edit the function from before:

def add_x_multiply_by_y(value: int, x: int, y: int = 5) -> int:
    return (value+x)*y

add_x_multiply_by_y(1, 2)
15

1 and 2 are positional arguments for value and x, while y is set to 5 when not included in the function call.

There’s a catch when considering default values — when writing the function, default values must occupy the last spot(s) in the signature, otherwise the function will not run. The following example generates the error non-default argument follows default argument (<string>, line X):

def add_x_multiply_by_y(value: int = 0, x: int, y: int) -> int:
    return (value+x)*y

add_x_multiply_by_y(x=1, y=3)

By default, you can pass arguments as either positional or keyword arguments. With that being said, if you want to, you can create arguments that are only positional or only keyword; to guarantee only keyword use tuple packing before a keyword argument in the following manner:

def sum_then_multiply_by_x(*args, x) -> int:
    return sum(args)*x

sum_then_multiply_by_x(1,2,3,4, x=5)
50

The logic here is pretty straightforward — if you don’t include a keyword, the compiler will assume that every value is part of *args and the function won’t run. However, if we have a positional argument before *args, all will be fine:

def sum_then_multiply_by_x(x, *args) -> int:
    return sum(args)*x

sum_then_multiply_by_x(1,2,3,4,5)
14

Positional arguments dictate that the first parameter will be assigned to the first available variable, then the rest will be applied to *args. If this is the case, how do we assert that some arguments be positional only? We use / as a separate argument, which asserts that everything before / is positional:

def sum_then_multiply_by_x(one, two, /, three, x) -> int:
    return sum([one, two, three])*x

print(sum_then_multiply_by_x(1,2,3,4)) # all positional, will work
print(sum_then_multiply_by_x(1,2,three=3,x=5)) # two keyword, two positional, will work
print(sum_then_multiply_by_x(1,two=2,three=3,x=6)) # a positional only argument was passed as a keyword argument, error
24
30
`sum_then_multiply_by_x() got some positional-only arguments passed as keyword arguments: 'two'`

While many of the topics we discussed in this section are optional, we hope you walk away with a better understanding of how function arguments work and why some errors may appear when your code looks fine.


Examples

Write a function called get_filename_from_url that, given a url to a file, like image.shutterstock.com/image-vector/cute-dogs-line-art-border-260nw-1079902403.jpg returns the filename with the extension.

import os
from urllib.parse import urlparse

def get_filename_from_url(url: str) -> str:
    """
    Given a link to a file, return the filename with extension.

    Args:
        url (str): The url of the file.

    Returns:
        str: A string with the filename, including the file extension.
    """
    return os.path.basename(urlparse(url).path)

Write a function that, given a URL to an image, and a full path to a directory, saves the image to the provided directory. By default, have the function save the images to the user’s home directory in a UNIX-like operating system.

import requests
from pathlib import Path
import getpass

def scrape_image(from_url: str, to_dir: str = f'/home/{getpass.getuser()}'):
    """
    Given a url to an image, scrape the image and save the image to the provided directory.
    If no directory is provided, by default, save to the user's home directory.

    Args:
        from_url (str): U
        to_dir (str, optional): [description]. Defaults to f'/home/{getpass.getuser()}'.
    """
    resp = requests.get(from_url)

    # this function is from the previous example
    filename = get_filename_from_url(from_url)

    # Make directory if doesn't already exist
    Path(to_dir).mkdir(parents=True, exist_ok=True)

    file = open(f'{to_dir}/{filename}', "wb")
    file.write(resp.content)
    file.close()