Spotify

Source

We have two spotify data sets. The first dataset is available to download here: www.kaggle.com/datasets/ksuqing/spotify-data-1986-2023

Description of the Data

Spotify Data 1

The first Spotify dataset contains spotify tracks with detailed audio features and popularity score (0-100).

Here is information on the variables in the dataset:

Feature Description

track_id

Unique identifier for the track

track_name

Name of the track

popularity

Popularity score (0–100) based on Spotify plays

available_markets

Markets/countries where the track is available

disc_number

Disc number (for albums with multiple discs)

duration_ms

Track duration in milliseconds

explicit

Whether the track contains explicit content (True/False)

track_number

Position of the track within the album

href

Spotify API endpoint URL for the track

album_id

Unique identifier for the album

album_name

Name of the album

album_release_date

Release date of the album

album_type

Album type (album, single, compilation)

album_total_tracks

Total number of tracks in the album

artists_names

Names of the artists on the track

artists_ids

Unique identifiers of the artists

principal_artist_id

ID of the principal/primary artist

principal_artist_name

Name of the principal/primary artist

artist_genres

Genres associated with the principal artist

principal_artist_followers

Number of Spotify followers of the principal artist

acousticness

Confidence measure of whether the track is acoustic (0–1)

analysis_url

Spotify API URL for detailed track analysis

danceability

How suitable a track is for dancing (0–1)

energy

Intensity and activity measure of the track (0–1)

instrumentalness

Predicts whether a track contains vocals (0–1)

key

Estimated key of the track (integer, e.g. 0=C, 1=C#/Db)

liveness

Presence of an audience in the recording (0–1)

loudness

Overall loudness of the track in decibels (dB)

mode

Modality of the track (1=major, 0=minor)

speechiness

Presence of spoken words (0–1)

tempo

Estimated tempo in beats per minute (BPM)

time_signature

Estimated overall time signature

valence

Musical positivity/happiness of the track (0–1)

year

Year the track was released

duration_min

Track duration in minutes

Spotify Data 2

This dataset provides 114000 tracks from Spotify with their audio features and relevant data. Each row corresponds to the individual tracks. Spotify’s web API was used to obtain the data. We have information of both numerical data and descriptive features.

We can read more about each of the columns here also: huggingface.co/datasets/maharshipandya/spotify-tracks-dataset

Transformations to the original data source

In one of our projects including the Spotify Data 1, the data was randomly sampled from original dataset for linear regression learning purposes.