Plotly

Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this document, we will dive into ploting using plotly, a very popular source graphing library that can interact graphs online.

Plotly Express is a simple way to make interactive graphs directly from your data. It works with pandas DataFrames.

Once you create a graph (like a bar chart or scatter plot), you can edit it. You can add titles, change axis labels, adjust colors, or pick a different theme to make it look exactly how you want (similar to Matplotlib). Also, the graphs are interactive—you can hover over points, zoom in, and explore your data. Visit the Plotly Express Documentation for examples and detailed explanations. It’s a great resource to learn about all the different graph types and how to customize them.

In this document we will:

  • Create visualizations using plotly.

  • Modify axis labels, titles, and other graph elements.

  • Customize plots with colors, shapes, and line types.

  • Explore dataset preparation and handling missing values.

  • Try to understanding trends and patterns in the data using Plotly.

Understanding the Data Prior to Plotting

We will use the following dataset(s) to understand plotly:

/anvil/projects/tdm/data/zillow/Metro_time_series.csv

First, let’s review the dataset’s columns to identify which ones might be suitable for creating a plot. We can use the .columns function

import plotly.express as px
import pandas as pd

myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
myDF.columns
Index(['Date', 'RegionName', 'AgeOfInventory', 'DaysOnZillow_AllHomes',
       'InventorySeasonallyAdjusted_AllHomes', 'InventoryRaw_AllHomes',
       'InventorySeasonallyAdjusted_BottomTier',
       'InventorySeasonallyAdjusted_MiddleTier',
       'InventorySeasonallyAdjusted_TopTier',
       'MedianListingPricePerSqft_1Bedroom',
       'MedianListingPricePerSqft_2Bedroom',
       'MedianListingPricePerSqft_3Bedroom',
       'MedianListingPricePerSqft_4Bedroom',
       'MedianListingPricePerSqft_5BedroomOrMore',
       'MedianListingPricePerSqft_AllHomes',
       'MedianListingPricePerSqft_CondoCoop',
       'MedianListingPricePerSqft_DuplexTriplex',
       'MedianListingPricePerSqft_SingleFamilyResidence',
       'MedianListingPrice_1Bedroom', 'MedianListingPrice_2Bedroom',
       'MedianListingPrice_3Bedroom', 'MedianListingPrice_4Bedroom',
       'MedianListingPrice_5BedroomOrMore', 'MedianListingPrice_AllHomes',
       'MedianListingPrice_CondoCoop', 'MedianListingPrice_DuplexTriplex',
       'MedianListingPrice_SingleFamilyResidence',
       'MedianPctOfPriceReduction_AllHomes',
       'MedianPctOfPriceReduction_CondoCoop',
       'MedianPctOfPriceReduction_SingleFamilyResidence',
       'MedianPriceCutDollar_AllHomes', 'MedianPriceCutDollar_CondoCoop',
       'MedianPriceCutDollar_SingleFamilyResidence',
       'MedianRentalPricePerSqft_1Bedroom',
       'MedianRentalPricePerSqft_2Bedroom',
       'MedianRentalPricePerSqft_3Bedroom',
       'MedianRentalPricePerSqft_4Bedroom',
       'MedianRentalPricePerSqft_5BedroomOrMore',
       'MedianRentalPricePerSqft_AllHomes',
       'MedianRentalPricePerSqft_CondoCoop',
       'MedianRentalPricePerSqft_DuplexTriplex',
       'MedianRentalPricePerSqft_MultiFamilyResidence5PlusUnits',
       'MedianRentalPricePerSqft_SingleFamilyResidence',
       'MedianRentalPricePerSqft_Studio', 'MedianRentalPrice_1Bedroom',
       'MedianRentalPrice_2Bedroom', 'MedianRentalPrice_3Bedroom',
       'MedianRentalPrice_4Bedroom', 'MedianRentalPrice_5BedroomOrMore',
       'MedianRentalPrice_AllHomes', 'MedianRentalPrice_CondoCoop',
       'MedianRentalPrice_DuplexTriplex',
       'MedianRentalPrice_MultiFamilyResidence5PlusUnits',
       'MedianRentalPrice_SingleFamilyResidence', 'MedianRentalPrice_Studio',
       'ZHVIPerSqft_AllHomes', 'PctOfHomesDecreasingInValues_AllHomes',
       'PctOfHomesIncreasingInValues_AllHomes',
       'PctOfHomesSellingForGain_AllHomes',
       'PctOfHomesSellingForLoss_AllHomes',
       'PctOfListingsWithPriceReductionsSeasAdj_AllHomes',
       'PctOfListingsWithPriceReductionsSeasAdj_BottomTier',
       'PctOfListingsWithPriceReductionsSeasAdj_CondoCoop',
       'PctOfListingsWithPriceReductionsSeasAdj_MiddleTier',
       'PctOfListingsWithPriceReductionsSeasAdj_SingleFamilyResidence',
       'PctOfListingsWithPriceReductionsSeasAdj_TopTier',
       'PctOfListingsWithPriceReductions_AllHomes',
       'PctOfListingsWithPriceReductions_BottomTier',
       'PctOfListingsWithPriceReductions_CondoCoop',
       'PctOfListingsWithPriceReductions_MiddleTier',
       'PctOfListingsWithPriceReductions_SingleFamilyResidence',
       'PctOfListingsWithPriceReductions_TopTier', 'PriceToRentRatio_AllHomes',
       'Sale_Counts_Msa', 'Sale_Counts_Seas_Adj_Msa', 'Sale_Prices_Msa',
       'InventoryTierShare_BottomTier_AllHomes',
       'InventoryTierShare_MiddleTier_AllHomes',
       'InventoryTierShare_TopTier_AllHomes', 'ZHVI_1bedroom', 'ZHVI_2bedroom',
       'ZHVI_3bedroom', 'ZHVI_4bedroom', 'ZHVI_5BedroomOrMore',
       'ZHVI_AllHomes', 'ZHVI_BottomTier', 'ZHVI_CondoCoop', 'ZHVI_MiddleTier',
       'ZHVI_SingleFamilyResidence', 'ZHVI_TopTier', 'ZRI_AllHomes',
       'ZRI_AllHomesPlusMultifamily', 'ZriPerSqft_AllHomes',
       'Zri_MultiFamilyResidenceRental', 'Zri_SingleFamilyResidenceRental'],
      dtype='object')

As we can see, there are many columns in the dataset! It’s essential to understand the available columns before creating any visualizations.

You can use the .head() function and the .tail() function to get a further preview of the data.

Barplots Using Plotly

Let’s start by plotting a Barplot of MedianListingPrice_1Bedroom and the first 10 regions that are not null. There are some missing values in our dataset, so we will need to filter for when they are not null.

myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
filteredDF = myDF[myDF['MedianListingPrice_1Bedroom'].notnull()][['RegionName', 'MedianListingPrice_1Bedroom']].head(10)
print(filteredDF)
 RegionName  MedianListingPrice_1Bedroom
124298      12060                     132130.0
124313      12700                     249900.0
124352      14460                     195000.0
124359      14720                     324000.0
124367      15180                     113750.0
124412      17200                      98500.0
124594      25540                      95000.0
124605      25940                     449000.0
124799      34820                     106900.0
124810      35300                     135000.0

Now that we see the values we will be plotting, let’s use the plotly library to plot a barchart of this data!

fig = px.bar(
    filteredDF,
    x='RegionName',
    y='MedianListingPrice_1Bedroom',
    title='Median Listing Price for 1-Bedroom Homes by Region (First 10)',
    labels={'RegionName': 'Region', 'MedianListingPrice_1Bedroom': 'Median Listing Price ($)'}
)

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Median Listing Price ($)",
    template="plotly_white"
)

fig.show()
Initial Bar Plot using Plotly
Figure 1. First bar plot

Boxplots Using Plotly

Now let’s use plotly to plot a box-plot.

A box plot (also known as a box-and-whisker plot) is a way of displaying the distribution of data based on the statistics:

  • Minimum

  • First quartile (Q1)

  • Median (Q2)

  • Third quartile (Q3)

  • Maximum

A box plot requires numerical data to display the distribution and summarize its key statistical properties. Let’s use the column DaysOnZillow_AllHomes and create a new column called Year to create a box-plot in plotly. Note, we will need to clean up the Date column to be able to create the Year column.

import pandas as pd
import plotly.express as px
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")

# Clean up year
myDF['Year'] = pd.to_datetime(myDF['Date']).dt.year

filtered_days_zillow = myDF[myDF['DaysOnZillow_AllHomes'].notnull()][['Year', 'DaysOnZillow_AllHomes']]

fig = px.box(
    filtered_days_zillow,
    x="Year",
    y="DaysOnZillow_AllHomes",
    title="Box Plot of Days on Zillow by Year",
)
fig.show()
Box-plot using Plotly
Figure 2. First Box-plot Plotly

Histograms with Plotly

import plotly.express as px
import pandas as pd

myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")

filtered_data = myDF[myDF['DaysOnZillow_AllHomes'].notnull()]

fig = px.histogram(
    filtered_data,
    x="DaysOnZillow_AllHomes",
    nbins=100,  # Adjustable
    title="Histogram of Days on Zillow",
    labels={"DaysOnZillow_AllHomes": "Days on Zillow"}
)

fig.show()
Histograms using Plotly
Figure 3. Histogram using Plotly

Notice how outliers in the histogram appear as really small bars at the extreme ends of the distribution, with very few values compared to the rest of the dataset. The histogram helps us understand the distribution of the number of days the houses are on zillow. Some properties might genuinely stay on Zillow for much longer due to location, pricing, or other factors.

Scatterplots with Plotly

Scatterplots in Plotly visually show relationships between two variables. Each point on the plot represents a data observation, with its position determined by the x and y values. You can plot a scatterplot in Plotly just as you can using Matplotlib.

Let’s plot a scatterplot of MedianListingPrice_1Bedroom vs DaysOnZillow_AllHomes.

import pandas as pd
import plotly.express as px

myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")

filteredDF_scatter = myDF.dropna(subset=['DaysOnZillow_AllHomes', 'MedianListingPrice_1Bedroom'])

fig = px.scatter(
    filteredDF_scatter,
    x='DaysOnZillow_AllHomes',
    y='MedianListingPrice_1Bedroom',
    title='Days on Zillow vs Median Listing Price for 1-Bedroom Homes',
    labels={'DaysOnZillow_AllHomes': 'Days on Zillow (All Homes)',
            'MedianListingPrice_1Bedroom': 'Median Listing Price (1-Bedroom)'}
)

fig.show()
Scatterplots using Plotly
Figure 4. Scatterplots using Plotly

Based on the scatterplot, we can see find see some interesting things! There appears to be a general downward trend between the number of days a home is listed on Zillow DaysOnZillow_AllHomes and the median listing price for 1-bedroom homes MedianListingPrice_1Bedroom. This suggests that higher-priced 1-bedroom homes tend to spend fewer days on Zillow. Also, we can see that a significant concentration of points is clustered around lower listing prices (below $400K) and shorter listing durations (less than 150 days).

Notice how the syntax for these visualizations in Plotly remains largely consistent. You only need to change the function call—whether it’s px.scatter, px.box, or px.histogram—while keeping most of the parameters the same!

Plotly Live Demonstration

While this document provides an introduction for using Plotly, watching a live demonstration can significantly enhance your understanding. Dr. Ward has created a video that walks through how to use Plotly, explaining the process step-by-step. This video is particularly helpful for seeing how to apply these techniques and gain deeper insights into Plotly’s capabilities.

I highly recommend taking some time to watch the video, as it complements this document and provides an additional example.

In the video below, Dr. Ward introduces the Zillow dataset and provides resources for learning Plotly:

Introduction to plotting Zillow data with plotly express:

Brief example about making a bar plot in plotly express:

Box plots in plotly: