DataFrames

What is a DataFrame?

A DataFrame is one of the most common and highly utilized classes in the Pandas package. Much like data.frames in R, DataFrames in Pandas store tabular, two-dimensional datasets.

The most common operations start with reading data into a DataFrame, accessing the DataFrames’s attributes, and using the DataFrame’s methods to perform operations on the underlying data or with other DataFrames.

Common DataFrame Functions

The number of different functions possible with Pandas DataFrames could be (and probably is) it’s own book. DataFrames have an amazing range of functionality and are very beneficial to Python developers. A few of the most common DataFrame functions are included below for reference:

First we can create a simple DataFrame that you can use for testing.

myDF = pd.DataFrame([[1, 'pear'], [2, 'apple'], [3, 'orange']], columns=['count', 'fruit'])
print(myDF)
    count   fruit
0       1    pear
1       2   apple
2       3  orange

How to get the number of rows and columns in your DataFrame.

print(myDF.shape)

This returns a tuple with the first value as the number of rows and the second as the number of columns:

(3, 2)

What if I wanted to know the column names of my DataFrame?

print(myDF.columns)
Index(['count', 'fruit'], dtype='object')

This may not be as helpful when the DataFrame only has two columns. However, this can be very beneficial when working with larger DataFrames.

What if I wanted to change the name of one of my columns?

myDF = myDF.rename(columns={'fruit': 'groceries'})

You could also add the inplace=True argument to make the change directly to the DataFrame:

myDF.rename(columns={'fruit': 'groceries'}, inplace=True)

Either method would result in the fruit column being renamed to groceries in this example.

myDF.rename(columns={'fruit': 'groceries'}, inplace=True)
print(myDF.columns)
Index(['count', 'groceries'], dtype='object')

What if I wanted to display the first n rows of my DataFrame?

In this case Pandas has a hand built-in head function. By default head will return the first 5 rows. We can also pass an n= argument to the function if we want a different number of rows:

print(myDF.head())
print(myDF.head(n=2))
   count groceries
0      1      pear
1      2     apple
2      3    orange
   count groceries
0      1      pear
1      2     apple

In this case we have a short DataFrame so the initial head function will print all 3 rows (3 < 5).

What if I have a list of dicts and I want to create a DataFrame?

In Python it’s pretty common that you’ll want to convert a different group of data objects into a DataFrame. This allows you to compare the data in a tabular format and leverage the library of Pandas functions. One of the most commons ways to do this is with a list of dicts:

list_of_dicts = []
list_of_dicts.append({'columnA': 1, 'columnB': 2})
list_of_dicts.append({'columnB': 4, 'columnA': 1})

myDF = pd.DataFrame(list_of_dicts)
print(myDF.head())
   columnA  columnB
0        1        2
1        1        4

Resources

As always, Pandas has great documentation regarding DataFrames. It’s worth a read if you are new to Pandas and working with DataFrames.