An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition

Sara Sekkate

Mohammed Khalil

Abdellah Adib and Sofia Ben Jebara

Resumen

Because one of the key issues in improving the performance of Speech Emotion Recognition (SER) systems is the choice of an effective feature representation, most of the research has focused on developing a feature level fusion using a large set of features. In our study, we propose a relatively low-dimensional feature set that combines three features: baseline Mel Frequency Cepstral Coefficients (MFCCs), MFCCs derived from Discrete Wavelet Transform (DWT) sub-band coefficients that are denoted as DMFCC, and pitch based features. Moreover, the performance of the proposed feature extraction method is evaluated in clean conditions and in the presence of several real-world noises. Furthermore, conventional Machine Learning (ML) and Deep Learning (DL) classifiers are employed for comparison. The proposal is tested using speech utterances of both of the Berlin German Emotional Database (EMO-DB) and Interactive Emotional Dyadic Motion Capture (IEMOCAP) speech databases through speaker independent experiments. Experimental results show improvement in speech emotion detection over baselines.

Palabras claves

speech emotion recognition - feature fusion - SVM - naive Bayes - wavelet

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 8 Parte: 4 (2019)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

DOI

https://doi.org/10.3390/computers8040091

An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition

Artículos similares

Revistas destacadas