Disentangled Feature Learning for Noise-Invariant Speech Enhancement

Soo Hyun Bae

Inkyu Choi and Nam Soo Kim

Resumen

Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data are drawn from a generative process involving multiple entangled factors, disentangling the speech factor can encourage the trained model to result in better performance for speech enhancement. With the recent success in learning disentangled representation using neural networks, we explore a framework for disentangling speech and noise, which has not been exploited in the conventional speech enhancement algorithms. In this work, we propose a novel noise-invariant speech enhancement method which manipulates the latent features to distinguish between the speech and noise features in the intermediate layers using adversarial training scheme. To compare the performance of the proposed method with other conventional algorithms, we conducted experiments in both the matched and mismatched noise conditions using TIMIT and TSPspeech datasets. Experimental results show that our model successfully disentangles the speech and noise latent features. Consequently, the proposed model not only achieves better enhancement performance but also offers more robust noise-invariant property than the conventional speech enhancement techniques.

Palabras claves

noise-invariant speech enhancement - disentangled feature learning - adversarial training - deep neural networks - noise reduction

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 9 Parte: 11 (2019)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

DOI

https://doi.org/10.3390/app9112289

Disentangled Feature Learning for Noise-Invariant Speech Enhancement

Artículos similares

Revistas destacadas