RAVDESS

www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio

Description of the Data

The RAVDESS dataset is a speech and audio dataset designed to perform actions like emotion classification and other general NLP tasks.

The dataset has the following directory structure:

ravdess_data/
├── Actor_01/
│   ├── 03-01-01-01-01-01-01.wav  (neutral)
│   ├── 03-01-02-01-01-01-01.wav  (calm)
│   ├── 03-01-03-01-01-01-01.wav  (happy)
│   ├── 03-01-04-01-01-01-01.wav  (sad)
│   ├── 03-01-05-01-01-01-01.wav  (angry)
│   ├── 03-01-06-01-01-01-01.wav  (fearful)
│   ├── 03-01-07-01-01-01-01.wav  (disgust)
│   ├── 03-01-08-01-01-01-01.wav  (surprised)
│   ├── 03-01-01-02-01-01-01.wav  (neutral, strong intensity)
│   └── ... (more files with different statements/repetitions)
├── Actor_02/
│   ├── 03-01-01-01-01-02-01.wav  (neutral)
│   ├── 03-01-03-01-01-02-01.wav  (happy)
│   └── ... (similar pattern for all emotions)
├── Actor_03/
│   └── ... (continues for all 24 actors)
└── ...

File naming convention: modality-vocal_channel-emotion-intensity-statement-repetition-actor.wav

Modality: 03 = audio+video (we’ll use these)
Vocal channel: 01 = speech, 02 = song
Emotion: 01=neutral, 02=calm, 03=happy, 04=sad, 05=angry, 06=fearful, 07=disgust, 08=surprised
Intensity: 01 = normal, 02 = strong
Statement: 01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door"
Repetition: 01 = first repetition, 02 = second repetition
Actor: 01-24 = actor ID

Transformations to the original data source

Dataset on Anvil is how it comes from Kaggle.