Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features

Qiang Zhu

Zhong Wang

Yunfeng Dou and Jian Zhou

Resumen

A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of whispered speech and normal speech was developed through the Gaussian mixture model (GMM). Then, the MFCC feature parameters of normal speech corresponding to whispered speech were obtained based on the GMM and, finally, whispered speech was converted into normal speech through the inversion of MFCC features. The experimental results showed that the cepstral distortion (CD) of the normal speech converted by the proposed method was 21% less than that of the normal speech converted by the linear predictive coefficient (LPC) features, the mean opinion score (MOS) was 3.56, and a satisfactory outcome in both intelligibility and sound quality was achieved.

Palabras claves

whispered speech conversion - MFCC feature inversion - Gaussian mixture model - cepstral distortion

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 15 Parte: 2 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

DOI

https://doi.org/10.3390/a15020068

Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features

Artículos similares

Revistas destacadas