|
|
|
Juan Carlos Atenco, Juan Carlos Moreno and Juan Manuel Ramirez
In this work we present a bimodal multitask network for audiovisual biometric recognition. The proposed network performs the fusion of features extracted from face and speech data through a weighted sum to jointly optimize the contribution of each modali...
ver más
|
|
|
|
|
|
|
Wondimu Lambamo, Ramasamy Srinivasagan and Worku Jifara
The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Spea...
ver más
|
|
|
|
|
|
|
Jingwen Yang and Ruohua Zhou
Whisper speaker recognition (WSR) has received extensive attention from researchers in recent years, and it plays an important role in medical, judicial, and other fields. Among them, the establishment of a whisper dataset is very important for the study...
ver más
|
|
|
|
|
|
|
Seunguook Lim and Jihie Kim
Emotion recognition in conversation (ERC) is receiving more and more attention, as interactions between humans and machines increase in a variety of services such as chat-bot and virtual assistants. As emotional expressions within a conversation can heav...
ver más
|
|
|
|
|
|
|
Fei Xie, Dalong Zhang and Chengming Liu
Transformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with conv...
ver más
|
|
|
|
|
|
|
Nikolaos Vryzas, Nikolaos Tsipas and Charalampos Dimoulas
Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichmen...
ver más
|
|
|
|
|
|
|
Francesc Alías, Antonio Bonafonte and António Teixeira
The main goal of this Special Issue is to present the latest advances in research and novel applications of speech and language technologies based on the works presented at the IberSPEECH edition held in Barcelona in 2018, paying special attention to tho...
ver más
|
|
|
|
|
|
|
Shih-An Li, Yu-Ying Liu, Yun-Chien Chen, Hsuan-Ming Feng, Pi-Kang Shen and Yu-Che Wu
This paper designed a voice interactive robot system that can conveniently execute assigned service tasks in real-life scenarios. It is equipped without a microphone where users can control the robot with spoken commands; the voice commands are then reco...
ver más
|
|
|
|
|
|
|
Zuhragvl Aysa, Mijit Ablimit, Hankiz Yilahun and Askar Hamdulla
In multi-lingual, multi-speaker environments (e.g., international conference scenarios), speech, language, and background sounds can overlap. In real-world scenarios, source separation techniques are needed to separate target sounds. Downstream tasks, su...
ver más
|
|
|
|
|
|
|
Pavitra Patel,A. A. Chaudhari,M. A. Pund,D. H. Deshmukh
Pág. 56 - 64
Speech emotion recognition is an important issue which affects the human machine interaction. Automatic recognition of human emotion in speech aims at recognizing the underlying emotional state of a speaker from the speech signal. Gaussian mixture models...
ver más
|
|
|
|
|
|
|
Driss Khalil, Amrutha Prasad, Petr Motlicek, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Srikanth Madikeri and Christof Schuepbach
In air traffic management (ATM), voice communications are critical for ensuring the safe and efficient operation of aircraft. The pertinent voice communications?air traffic controller (ATCo) and pilot?are usually transmitted in a single channel, which po...
ver más
|
|
|
|
|
|
|
Xiao Xu, Xuehan Zhang, Zhongxu Bao, Xiaojie Yu, Yuqing Yin, Xu Yang and Qiang Niu
Hand gesture recognition is an essential Human?Computer Interaction (HCI) mechanism for users to control smart devices. While traditional device-based methods support acceptable recognition performance, the recent advance in wireless sensing could enable...
ver más
|
|
|
|
|
|
|
Abdelfatah Ahmed, Mohamed Bader, Ismail Shahin, Ali Bou Nassif, Naoufel Werghi and Mohammad Basel
The Arabic language has always been an immense source of attraction to various people from different ethnicities by virtue of the significant linguistic legacy that it possesses. Consequently, a multitude of people from all over the world are yearning to...
ver más
|
|
|
|
|
|
|
Hanif Fakhrurroja,Carmadi Machbub,Ary Setijadi Prihatmanto,Ayu Purwarianti
Pág. pp. 44 - 67
This paper proposes a way to control home appliances using a multimodal interaction system such as speech, gestures, and smartphone applications. The sensor to capture speech, in the Indonesian language, and gestures from users are Kinect v2. Speech reco...
ver más
|
|
|
|
|
|
|
Rania M. Ghoniem, Abeer D. Algarni and Khaled Shaalan
In multi-modal emotion aware frameworks, it is essential to estimate the emotional features then fuse them to different degrees. This basically follows either a feature-level or decision-level strategy. In all likelihood, while features from several moda...
ver más
|
|
|
|
|
|
|
Mauro Zaninelli
A portable wireless device with a ?vocal commands? feature for activating the mechanical milking phase in conventional milking parlors was developed and tested to increase the level of automation in the milking procedures. The device was tested in the la...
ver más
|
|
|
|
|
|
|
Lizhen Jia, Yanyan Xu and Dengfeng Ke
Recent speech enhancement studies have mostly focused on completely separating noise from human voices. Due to the lack of specific structures for harmonic fitting in previous studies and the limitations of the traditional convolutional receptive field, ...
ver más
|
|
|
|
|
|
|
Miodrag D. Ku?ljevic and Vladimir V. Vujicic
Although voiced speech signals are physical signals which are approximately harmonic and electric power signals are true harmonic, the algorithms used for harmonic analysis in electric power systems can be successfully used in speech processing, includin...
ver más
|
|
|
|
|
|
|
Sara Sekkate, Mohammed Khalil, Abdellah Adib and Sofia Ben Jebara
Because one of the key issues in improving the performance of Speech Emotion Recognition (SER) systems is the choice of an effective feature representation, most of the research has focused on developing a feature level fusion using a large set of featur...
ver más
|
|
|
|
|
|
|
Esther Rituerto-González, Alba Mínguez-Sánchez, Ascensión Gallardo-Antolín and Carmen Peláez-Moreno
A Speaker Identification system for a personalized wearable device to combat gender-based violence is presented in this paper. Speaker recognition systems exhibit a decrease in performance when the user is under emotional or stress conditions, thus the o...
ver más
|
|
|
|