Spotify

Source

We have two spotify data sets. The first dataset is available to download here: www.kaggle.com/datasets/ksuqing/spotify-data-1986-2023

The second one is obtained from github.com/Juanfra21/data-science/blob/main/Final-Project/spotify_dataset.csv

Description of the Data

Spotify Data 1

The first Spotify dataset contains spotify tracks with detailed audio features and popularity score (0-100).

Here is information on the variables in the dataset:

Feature	Description
track_id	Unique identifier for the track
track_name	Name of the track
popularity	Popularity score (0–100) based on Spotify plays
available_markets	Markets/countries where the track is available
disc_number	Disc number (for albums with multiple discs)
duration_ms	Track duration in milliseconds
explicit	Whether the track contains explicit content (True/False)
track_number	Position of the track within the album
href	Spotify API endpoint URL for the track
album_id	Unique identifier for the album
album_name	Name of the album
album_release_date	Release date of the album
album_type	Album type (album, single, compilation)
album_total_tracks	Total number of tracks in the album
artists_names	Names of the artists on the track
artists_ids	Unique identifiers of the artists
principal_artist_id	ID of the principal/primary artist
principal_artist_name	Name of the principal/primary artist
artist_genres	Genres associated with the principal artist
principal_artist_followers	Number of Spotify followers of the principal artist
acousticness	Confidence measure of whether the track is acoustic (0–1)
analysis_url	Spotify API URL for detailed track analysis
danceability	How suitable a track is for dancing (0–1)
energy	Intensity and activity measure of the track (0–1)
instrumentalness	Predicts whether a track contains vocals (0–1)
key	Estimated key of the track (integer, e.g. 0=C, 1=C#/Db)
liveness	Presence of an audience in the recording (0–1)
loudness	Overall loudness of the track in decibels (dB)
mode	Modality of the track (1=major, 0=minor)
speechiness	Presence of spoken words (0–1)
tempo	Estimated tempo in beats per minute (BPM)
time_signature	Estimated overall time signature
valence	Musical positivity/happiness of the track (0–1)
year	Year the track was released
duration_min	Track duration in minutes

Feature

Description

track_id

Unique identifier for the track

track_name

Name of the track

popularity

Popularity score (0–100) based on Spotify plays

available_markets

Markets/countries where the track is available

disc_number

Disc number (for albums with multiple discs)

duration_ms

Track duration in milliseconds

explicit

Whether the track contains explicit content (True/False)

track_number

Position of the track within the album

href

Spotify API endpoint URL for the track

album_id

Unique identifier for the album

album_name

Name of the album

album_release_date

Release date of the album

album_type

Album type (album, single, compilation)

album_total_tracks

Total number of tracks in the album

artists_names

Names of the artists on the track

artists_ids

Unique identifiers of the artists

principal_artist_id

ID of the principal/primary artist

principal_artist_name

Name of the principal/primary artist

artist_genres

Genres associated with the principal artist

principal_artist_followers

Number of Spotify followers of the principal artist

acousticness

Confidence measure of whether the track is acoustic (0–1)

analysis_url

Spotify API URL for detailed track analysis

danceability

How suitable a track is for dancing (0–1)

energy

Intensity and activity measure of the track (0–1)

instrumentalness

Predicts whether a track contains vocals (0–1)

key

Estimated key of the track (integer, e.g. 0=C, 1=C#/Db)

liveness

Presence of an audience in the recording (0–1)

loudness

Overall loudness of the track in decibels (dB)

mode

Modality of the track (1=major, 0=minor)

speechiness

Presence of spoken words (0–1)

tempo

Estimated tempo in beats per minute (BPM)

time_signature

Estimated overall time signature

valence

Musical positivity/happiness of the track (0–1)

year

Year the track was released

duration_min

Track duration in minutes

Spotify Data 2

This dataset provides 114000 tracks from Spotify with their audio features and relevant data. Each row corresponds to the individual tracks. Spotify’s web API was used to obtain the data. We have information of both numerical data and descriptive features.

We can read more about each of the columns here also: huggingface.co/datasets/maharshipandya/spotify-tracks-dataset

Transformations to the original data source

In one of our projects including the Spotify Data 1, the data was randomly sampled from original dataset for linear regression learning purposes.