|
|
|
Santi, Yoshitaka Nakajima, Kazuo Ueda and Gerard B. Remijn
Mosaic speech is degraded speech that is segmented into time × frequency blocks. Earlier research with Japanese mosaic speech has shown that its intelligibility is almost perfect for mosaic block durations (MBD) up to 40 ms. The purpose of the present st...
ver más
|
|
|
|
|
|
|
Jerry D. Gibson
Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, a...
ver más
|
|
|
|
|
|
|
Rui Cheng, Changchun Bao and Zihao Cui
The proposed MASS can simulate multiple signals collected by microphone array in room acoustic environment for multi-channel speech coding and enhancement.
|
|
|
|
|
|
|
Jaco Badenhorst and Febe de Wet
When the National Centre for Human Language Technology (NCHLT) Speech corpus was released, it created various opportunities for speech technology development in the 11 official, but critically under-resourced, languages of South Africa. Since then, the s...
ver más
|
|
|
|
|
|
|
Olga Lucia Ramos Sandoval, Erika Nathalia Gamma Melo, Dario Amaya Hurtado
Pág. 287 - 298
Abstract AuthorsDownloadsReferencesHow to Cite
|
|
|
|
|
|
|
Saida Mussakhojayeva, Kaisar Dauletbek, Rustem Yeshpanov and Huseyin Atakan Varol
The primary aim of this study was to contribute to the development of multilingual automatic speech recognition for lower-resourced Turkic languages. Ten languages?Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek?we...
ver más
|
|
|
|
|
|
|
Liangliang Cheng, Yunfeng Dou, Jian Zhou, Huabin Wang and Liang Tao
Because of the acoustic characteristics of bone-conducted (BC) speech, BC speech can be enhanced to better communicate in a complex environment with high noise. Existing BC speech enhancement models have weak spectral recovery capability for the high-fre...
ver más
|
|
|
|
|
|
|
Qiang Zhu, Zhong Wang, Yunfeng Dou and Jian Zhou
A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching rel...
ver más
|
|
|
|
|
|
|
Seung-Jun Lee and Hyuk-Yoon Kwon
In this paper, we propose a preprocessing strategy for denoising of speech data based on speech segment detection. A design of computationally efficient speech denoising is necessary to develop a scalable method for large-scale data sets. Furthermore, it...
ver más
|
|
|
|
|
|
|
Gyuseok Park, Woohyeong Cho, Kyu-Sung Kim and Sangmin Lee
Hearing aids are small electronic devices designed to improve hearing for persons with impaired hearing, using sophisticated audio signal processing algorithms and technologies. In general, the speech enhancement algorithms in hearing aids remove the env...
ver más
|
|
|
|
|
|
|
Hyeon-Kyu Noh and Hong-June Park
A convolutional neural network (CNN) transducer decoder was proposed to reduce the decoding time of an end-to-end automatic speech recognition (ASR) system while maintaining accuracy. The CNN of 177 k parameters and a kernel size of 6 generates the proba...
ver más
|
|
|
|
|
|
|
Haohan Shi, Xiyu Shi and Safak Dogan
Audio inpainting plays an important role in addressing incomplete, damaged, or missing audio signals, contributing to improved quality of service and overall user experience in multimedia communications over the Internet and mobile networks. This paper p...
ver más
|
|
|
|
|
|
|
Nurgali Kadyrbek, Madina Mansurova, Adai Shomanov and Gaukhar Makharova
This study is devoted to the transcription of human speech in the Kazakh language in dynamically changing conditions. It discusses key aspects related to the phonetic structure of the Kazakh language, technical considerations in collecting the transcribe...
ver más
|
|
|
|
|
|
|
Makito Kawata, Mariko Tsuruta-Hamamura and Hiroshi Hasegawa
Understanding the impact of room acoustics on non-native listeners is crucial, particularly in standardized English as a foreign language (EFL) proficiency testing environments. This study aims to elucidate how acoustics influence test scores, considerin...
ver más
|
|
|
|
|
|
|
Luis Gomez-Agustina, Haydar Aygun and Liji Suseela Thankom Mohan
Objective speech intelligibility estimations undertaken in natural acoustics speech communications (NAS) scenarios require the utilization of a speech source that approximates the acoustic characteristics of a human talker. Only a limited number of speci...
ver más
|
|
|
|
|
|
|
Chengkai Cai, Kenta Iwai and Takanobu Nishiura
The development of distant-talk measurement systems has been attracting attention since they can be applied to many situations such as security and disaster relief. One such system that uses a device called a laser Doppler vibrometer (LDV) to acquire sou...
ver más
|
|
|
|
|
|
|
Mikel Penagarikano, Amparo Varona, Germán Bordel and Luis Javier Rodriguez-Fuentes
In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an...
ver más
|
|
|
|
|
|
|
Alejandro Molina-Villegas, Thomas Cattin, Karina Gazca-Hernandez and Edwin Aldana-Bobadilla
Currently, a significant portion of published research on online hate speech relies on existing textual corpora. However, when examining a specific context, there is a lack of preexisting datasets that include the particularities associated with various ...
ver más
|
|
|
|
|
|
|
Dan Ungureanu, Stefan-Adrian Toma, Ion-Dorinel Filip, Bogdan-Costel Mocanu, Iulian Aciobani?ei, Bogdan Marghescu, Titus Balan, Mihai Dascalu, Ion Bica and Florin Pop
The evolution of Natural Language Processing technologies transformed them into viable choices for various accessibility features and for facilitating interactions between humans and computers. A subset of them consists of speech processing systems, such...
ver más
|
|
|
|
|
|
|
Jialin Zhang, Mairidan Wushouer, Gulanbaier Tuerhong and Hanfang Wang
Emotional speech synthesis is an important branch of human?computer interaction technology that aims to generate emotionally expressive and comprehensible speech based on the input text. With the rapid development of speech synthesis technology based on ...
ver más
|
|
|
|