DataFrames
What is a DataFrame?
A DataFrame is one of the most common and highly utilized classes in the Pandas package. Much like data.frames
in R, DataFrames in Pandas store tabular, two-dimensional datasets.
The most common operations start with reading data into a DataFrame, accessing the DataFrames’s attributes, and using the DataFrame’s methods to perform operations on the underlying data or with other DataFrames.
Common DataFrame Functions
The number of different functions possible with Pandas DataFrames could be (and probably is) it’s own book. DataFrames have an amazing range of functionality and are very beneficial to Python developers. A few of the most common DataFrame functions are included below for reference:
First we can create a simple DataFrame that you can use for testing.
myDF = pd.DataFrame([[1, 'pear'], [2, 'apple'], [3, 'orange']], columns=['count', 'fruit'])
print(myDF)
count fruit 0 1 pear 1 2 apple 2 3 orange
How to get the number of rows and columns in your DataFrame.
print(myDF.shape)
This returns a tuple
with the first value as the number of rows and the second as the number of columns:
(3, 2)
What if I wanted to know the column names of my DataFrame?
print(myDF.columns)
Index(['count', 'fruit'], dtype='object')
This may not be as helpful when the DataFrame only has two columns. However, this can be very beneficial when working with larger DataFrames.
What if I wanted to change the name of one of my columns?
myDF = myDF.rename(columns={'fruit': 'groceries'})
You could also add the inplace=True
argument to make the change directly to the DataFrame:
myDF.rename(columns={'fruit': 'groceries'}, inplace=True)
Either method would result in the fruit
column being renamed to groceries
in this example.
myDF.rename(columns={'fruit': 'groceries'}, inplace=True)
print(myDF.columns)
Index(['count', 'groceries'], dtype='object')
What if I wanted to display the first n
rows of my DataFrame?
In this case Pandas has a hand built-in head
function. By default head
will return the first 5 rows. We can also pass an n=
argument to the function if we want a different number of rows:
print(myDF.head())
print(myDF.head(n=2))
count groceries 0 1 pear 1 2 apple 2 3 orange count groceries 0 1 pear 1 2 apple
In this case we have a short DataFrame so the initial head
function will print all 3 rows (3 < 5).
What if I have a list of dicts and I want to create a DataFrame?
In Python it’s pretty common that you’ll want to convert a different group of data objects into a DataFrame. This allows you to compare the data in a tabular format and leverage the library of Pandas functions. One of the most commons ways to do this is with a list of dicts:
list_of_dicts = []
list_of_dicts.append({'columnA': 1, 'columnB': 2})
list_of_dicts.append({'columnB': 4, 'columnA': 1})
myDF = pd.DataFrame(list_of_dicts)
print(myDF.head())
columnA columnB 0 1 2 1 1 4
Resources
As always, Pandas has great documentation regarding DataFrames. It’s worth a read if you are new to Pandas and working with DataFrames.