Reshaping in Pandas
pivot method from Pandas is a powerful tool that can be used to reshape a DataFrame.
pivot takes 3 main arguments:
index→ the column of the DataFrame that will be the new index.
columns→ the column of the DataFrame where you would like each unique value to be mapped to a column.
values→ the column of the DataFrame for which you’d like the values to be.
For example, given the sample DataFrame below lets say that we wanted to have attendance for home and away games displayed across the different weeks of the season:
list_1 = ['Wisconsin', 'IU', 'Rutgers', 'Michigan State', 'Ohio State'] list_2 = ['home', 'away', 'home', 'away', 'home'] list_3 = [85, 78, 90, 75, 74] list_4 = [500, 1000, 430, 4800, 10000] list_5 = [1, 2, 3, 4, 5] myDF = pd.DataFrame(zip(list_1, list_2, list_3, list_4, list_5), columns=['opponent', 'location', 'temp', 'attendance', 'week_of_season']) print(myDF.pivot(index='location', columns='week_of_season', values='attendance'))
week_of_season 1 2 3 4 5 location away NaN 1000.0 NaN 4800.0 NaN home 500.0 NaN 430.0 NaN 10000.0
If we wanted to we could also swap the columns and indexes:
print(myDF.pivot(index='week_of_season', columns='location', values='attendance'))
location away home week_of_season 1 NaN 500.0 2 1000.0 NaN 3 NaN 430.0 4 4800.0 NaN 5 NaN 10000.0
As before, you can also use
reset_index() to bring the index back in as a column.
print(myDF.pivot(index='week_of_season', columns='location', values='attendance').reset_index())
location week_of_season away home 0 1 NaN 500.0 1 2 1000.0 NaN 2 3 NaN 430.0 3 4 4800.0 NaN 4 5 NaN 10000.0
pivot is one of the most used Pandas shaping methods, there are many more helpful methods for different situations. It’s recommended to read through the Pandas doc on reshaping to get familiar with the other functions that are available.