Plotly
Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this document, we will dive into ploting using plotly
, a very popular source graphing library that can interact graphs online.
Plotly Express is a simple way to make interactive graphs directly from your data. It works with pandas DataFrames.
Once you create a graph (like a bar chart or scatter plot), you can edit it. You can add titles, change axis labels, adjust colors, or pick a different theme to make it look exactly how you want (similar to Matplotlib). Also, the graphs are interactive—you can hover over points, zoom in, and explore your data. Visit the Plotly Express Documentation for examples and detailed explanations. It’s a great resource to learn about all the different graph types and how to customize them.
In this document we will:
-
Create visualizations using plotly.
-
Modify axis labels, titles, and other graph elements.
-
Customize plots with colors, shapes, and line types.
-
Explore dataset preparation and handling missing values.
-
Try to understanding trends and patterns in the data using Plotly.
Understanding the Data Prior to Plotting
We will use the following dataset(s) to understand plotly:
/anvil/projects/tdm/data/zillow/Metro_time_series.csv
First, let’s review the dataset’s columns to identify which ones might be suitable for creating a plot. We can use the .columns
function
import plotly.express as px
import pandas as pd
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
myDF.columns
Index(['Date', 'RegionName', 'AgeOfInventory', 'DaysOnZillow_AllHomes', 'InventorySeasonallyAdjusted_AllHomes', 'InventoryRaw_AllHomes', 'InventorySeasonallyAdjusted_BottomTier', 'InventorySeasonallyAdjusted_MiddleTier', 'InventorySeasonallyAdjusted_TopTier', 'MedianListingPricePerSqft_1Bedroom', 'MedianListingPricePerSqft_2Bedroom', 'MedianListingPricePerSqft_3Bedroom', 'MedianListingPricePerSqft_4Bedroom', 'MedianListingPricePerSqft_5BedroomOrMore', 'MedianListingPricePerSqft_AllHomes', 'MedianListingPricePerSqft_CondoCoop', 'MedianListingPricePerSqft_DuplexTriplex', 'MedianListingPricePerSqft_SingleFamilyResidence', 'MedianListingPrice_1Bedroom', 'MedianListingPrice_2Bedroom', 'MedianListingPrice_3Bedroom', 'MedianListingPrice_4Bedroom', 'MedianListingPrice_5BedroomOrMore', 'MedianListingPrice_AllHomes', 'MedianListingPrice_CondoCoop', 'MedianListingPrice_DuplexTriplex', 'MedianListingPrice_SingleFamilyResidence', 'MedianPctOfPriceReduction_AllHomes', 'MedianPctOfPriceReduction_CondoCoop', 'MedianPctOfPriceReduction_SingleFamilyResidence', 'MedianPriceCutDollar_AllHomes', 'MedianPriceCutDollar_CondoCoop', 'MedianPriceCutDollar_SingleFamilyResidence', 'MedianRentalPricePerSqft_1Bedroom', 'MedianRentalPricePerSqft_2Bedroom', 'MedianRentalPricePerSqft_3Bedroom', 'MedianRentalPricePerSqft_4Bedroom', 'MedianRentalPricePerSqft_5BedroomOrMore', 'MedianRentalPricePerSqft_AllHomes', 'MedianRentalPricePerSqft_CondoCoop', 'MedianRentalPricePerSqft_DuplexTriplex', 'MedianRentalPricePerSqft_MultiFamilyResidence5PlusUnits', 'MedianRentalPricePerSqft_SingleFamilyResidence', 'MedianRentalPricePerSqft_Studio', 'MedianRentalPrice_1Bedroom', 'MedianRentalPrice_2Bedroom', 'MedianRentalPrice_3Bedroom', 'MedianRentalPrice_4Bedroom', 'MedianRentalPrice_5BedroomOrMore', 'MedianRentalPrice_AllHomes', 'MedianRentalPrice_CondoCoop', 'MedianRentalPrice_DuplexTriplex', 'MedianRentalPrice_MultiFamilyResidence5PlusUnits', 'MedianRentalPrice_SingleFamilyResidence', 'MedianRentalPrice_Studio', 'ZHVIPerSqft_AllHomes', 'PctOfHomesDecreasingInValues_AllHomes', 'PctOfHomesIncreasingInValues_AllHomes', 'PctOfHomesSellingForGain_AllHomes', 'PctOfHomesSellingForLoss_AllHomes', 'PctOfListingsWithPriceReductionsSeasAdj_AllHomes', 'PctOfListingsWithPriceReductionsSeasAdj_BottomTier', 'PctOfListingsWithPriceReductionsSeasAdj_CondoCoop', 'PctOfListingsWithPriceReductionsSeasAdj_MiddleTier', 'PctOfListingsWithPriceReductionsSeasAdj_SingleFamilyResidence', 'PctOfListingsWithPriceReductionsSeasAdj_TopTier', 'PctOfListingsWithPriceReductions_AllHomes', 'PctOfListingsWithPriceReductions_BottomTier', 'PctOfListingsWithPriceReductions_CondoCoop', 'PctOfListingsWithPriceReductions_MiddleTier', 'PctOfListingsWithPriceReductions_SingleFamilyResidence', 'PctOfListingsWithPriceReductions_TopTier', 'PriceToRentRatio_AllHomes', 'Sale_Counts_Msa', 'Sale_Counts_Seas_Adj_Msa', 'Sale_Prices_Msa', 'InventoryTierShare_BottomTier_AllHomes', 'InventoryTierShare_MiddleTier_AllHomes', 'InventoryTierShare_TopTier_AllHomes', 'ZHVI_1bedroom', 'ZHVI_2bedroom', 'ZHVI_3bedroom', 'ZHVI_4bedroom', 'ZHVI_5BedroomOrMore', 'ZHVI_AllHomes', 'ZHVI_BottomTier', 'ZHVI_CondoCoop', 'ZHVI_MiddleTier', 'ZHVI_SingleFamilyResidence', 'ZHVI_TopTier', 'ZRI_AllHomes', 'ZRI_AllHomesPlusMultifamily', 'ZriPerSqft_AllHomes', 'Zri_MultiFamilyResidenceRental', 'Zri_SingleFamilyResidenceRental'], dtype='object')
As we can see, there are many columns in the dataset! It’s essential to understand the available columns before creating any visualizations.
You can use the .head() function and the .tail() function to get a further preview of the data.
Barplots Using Plotly
Let’s start by plotting a Barplot of MedianListingPrice_1Bedroom
and the first 10 regions that are not null. There are some missing values in our dataset, so we will need to filter for when they are not null.
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
filteredDF = myDF[myDF['MedianListingPrice_1Bedroom'].notnull()][['RegionName', 'MedianListingPrice_1Bedroom']].head(10)
print(filteredDF)
RegionName MedianListingPrice_1Bedroom 124298 12060 132130.0 124313 12700 249900.0 124352 14460 195000.0 124359 14720 324000.0 124367 15180 113750.0 124412 17200 98500.0 124594 25540 95000.0 124605 25940 449000.0 124799 34820 106900.0 124810 35300 135000.0
Now that we see the values we will be plotting, let’s use the plotly
library to plot a barchart of this data!
fig = px.bar(
filteredDF,
x='RegionName',
y='MedianListingPrice_1Bedroom',
title='Median Listing Price for 1-Bedroom Homes by Region (First 10)',
labels={'RegionName': 'Region', 'MedianListingPrice_1Bedroom': 'Median Listing Price ($)'}
)
fig.update_layout(
xaxis_title="Region",
yaxis_title="Median Listing Price ($)",
template="plotly_white"
)
fig.show()
Boxplots Using Plotly
Now let’s use plotly to plot a box-plot.
A box plot (also known as a box-and-whisker plot) is a way of displaying the distribution of data based on the statistics:
-
Minimum
-
First quartile (Q1)
-
Median (Q2)
-
Third quartile (Q3)
-
Maximum
A box plot requires numerical data to display the distribution and summarize its key statistical properties. Let’s use the column DaysOnZillow_AllHomes
and create a new column called Year
to create a box-plot in plotly. Note, we will need to clean up the Date
column to be able to create the Year
column.
import pandas as pd
import plotly.express as px
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
# Clean up year
myDF['Year'] = pd.to_datetime(myDF['Date']).dt.year
filtered_days_zillow = myDF[myDF['DaysOnZillow_AllHomes'].notnull()][['Year', 'DaysOnZillow_AllHomes']]
fig = px.box(
filtered_days_zillow,
x="Year",
y="DaysOnZillow_AllHomes",
title="Box Plot of Days on Zillow by Year",
)
fig.show()
Histograms with Plotly
import plotly.express as px
import pandas as pd
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
filtered_data = myDF[myDF['DaysOnZillow_AllHomes'].notnull()]
fig = px.histogram(
filtered_data,
x="DaysOnZillow_AllHomes",
nbins=100, # Adjustable
title="Histogram of Days on Zillow",
labels={"DaysOnZillow_AllHomes": "Days on Zillow"}
)
fig.show()
Notice how outliers in the histogram appear as really small bars at the extreme ends of the distribution, with very few values compared to the rest of the dataset. The histogram helps us understand the distribution of the number of days the houses are on zillow. Some properties might genuinely stay on Zillow for much longer due to location, pricing, or other factors.
Scatterplots with Plotly
Scatterplots in Plotly visually show relationships between two variables. Each point on the plot represents a data observation, with its position determined by the x and y values. You can plot a scatterplot in Plotly just as you can using Matplotlib.
Let’s plot a scatterplot of MedianListingPrice_1Bedroom
vs DaysOnZillow_AllHomes
.
import pandas as pd
import plotly.express as px
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
filteredDF_scatter = myDF.dropna(subset=['DaysOnZillow_AllHomes', 'MedianListingPrice_1Bedroom'])
fig = px.scatter(
filteredDF_scatter,
x='DaysOnZillow_AllHomes',
y='MedianListingPrice_1Bedroom',
title='Days on Zillow vs Median Listing Price for 1-Bedroom Homes',
labels={'DaysOnZillow_AllHomes': 'Days on Zillow (All Homes)',
'MedianListingPrice_1Bedroom': 'Median Listing Price (1-Bedroom)'}
)
fig.show()
Based on the scatterplot, we can see find see some interesting things! There appears to be a general downward trend between the number of days a home is listed on Zillow DaysOnZillow_AllHomes
and the median listing price for 1-bedroom homes MedianListingPrice_1Bedroom
. This suggests that higher-priced 1-bedroom homes tend to spend fewer days on Zillow. Also, we can see that a significant concentration of points is clustered around lower listing prices (below $400K) and shorter listing durations (less than 150 days).
Notice how the syntax for these visualizations in Plotly remains largely consistent. You only need to change the function call—whether it’s px.scatter, px.box, or px.histogram—while keeping most of the parameters the same!
Plotly Live Demonstration
While this document provides an introduction for using Plotly, watching a live demonstration can significantly enhance your understanding. Dr. Ward has created a video that walks through how to use Plotly, explaining the process step-by-step. This video is particularly helpful for seeing how to apply these techniques and gain deeper insights into Plotly’s capabilities.
I highly recommend taking some time to watch the video, as it complements this document and provides an additional example.
In the video below, Dr. Ward introduces the Zillow dataset and provides resources for learning Plotly:
Introduction to plotting Zillow data with plotly express:
Brief example about making a bar plot in plotly express:
Box plots in plotly: