|
|
|
Samuel R. Schrader and Eren Gultepe
The evaluation of similarities between natural languages often relies on prior knowledge of the languages being studied. We describe three methods for building phylogenetic trees and clustering languages without the use of language-specific information. ...
ver más
|
|
|
|
|
|
|
Tharindu Ranasinghe and Marcos Zampieri
The pervasiveness of offensive content in social media has become an important reason for concern for online platforms. With the aim of improving online safety, a large number of studies applying computational models to identify such content have been pu...
ver más
|
|
|
|
|
|
|
Mikel Penagarikano, Amparo Varona, Germán Bordel and Luis Javier Rodriguez-Fuentes
In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an...
ver más
|
|
|
|
|
|
|
Dezhi Cao, Yue Zhao and Licheng Wu
The construction of pronunciation dictionaries relies on high-quality and extensive training data in data-driven way. However, the manual annotation of corpus for this purpose is both costly and time consuming, especially for low-resource languages that ...
ver más
|
|
|
|
|
|
|
Simone Leonardi, Diego Monti, Giuseppe Rizzo and Maurizio Morisio
Intelligent agents have the potential to understand personality traits of human beings because of their every day interaction with us. The assessment of our psychological traits is a useful tool when we require them to simulate empathy. Since the creatio...
ver más
|
|
|
|
|
|
|
Kwee Teck See,Bava Harji Madhubala,Ah Choo Koo
Pág. pp. 20 - 36
The use of mobile devices for language learning, under the Mobile Assisted Language Learning (MALL) has been found to motivate children to read digital print. However, parents need to be convinced of the benefits of this new technology-assisted learning ...
ver más
|
|
|
|
|
|
|
Saida Mussakhojayeva, Kaisar Dauletbek, Rustem Yeshpanov and Huseyin Atakan Varol
The primary aim of this study was to contribute to the development of multilingual automatic speech recognition for lower-resourced Turkic languages. Ten languages?Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek?we...
ver más
|
|
|
|
|
|
|
Seid Muhie Yimam, Abinew Ali Ayele, Gopalakrishnan Venkatesh, Ibrahim Gashaw and Chris Biemann
The availability of different pre-trained semantic models has enabled the quick development of machine learning components for downstream applications. However, even if texts are abundant for low-resource languages, there are very few semantic models pub...
ver más
|
|
|
|
|
|
|
Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng
Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and stat...
ver más
|
|
|
|
|
|
|
Yonghua Wen, Junjun Guo, Zhiqiang Yu and Zhengtao Yu
Parallel sentences play a crucial role in various NLP tasks, particularly for cross-lingual tasks such as machine translation. However, due to the time-consuming and laborious nature of manual construction, many low-resource languages still suffer from a...
ver más
|
|
|
|
|
|
|
Rigas Kotsakis, Maria Matsiola, George Kalliris and Charalampos Dimoulas
The current paper focuses on the investigation of spoken-language classification in audio broadcasting content. The approach reflects a real-word scenario, encountered in modern media/monitoring organizations, where semi-automated indexing/documentation ...
ver más
|
|
|
|
|
|
|
Youngki Park and Youhyun Shin
This paper presents a novel approach for finding the most semantically similar conversational sentences in Korean and English. Our method involves training separate embedding models for each language and using a hybrid algorithm that selects the appropri...
ver más
|
|
|
|
|
|
|
Chuyang Yang and Chenyu Huang
Advanced digital data-driven applications have evolved and significantly impacted the transportation sector in recent years. This systematic review examines natural language processing (NLP) approaches applied to aviation safety-related domains. The auth...
ver más
|
|
|
|
|
|
|
Roberta Rodrigues de Lima, Anita M. R. Fernandes, James Roberto Bombasar, Bruno Alves da Silva, Paul Crocker and Valderi Reis Quietinho Leithardt
Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the comple...
ver más
|
|
|
|
|
|
|
Guizhe Song, Degen Huang and Zhifeng Xiao
Multilingual characteristics, lack of annotated data, and imbalanced sample distribution are the three main challenges for toxic comment analysis in a multilingual setting. This paper proposes a multilingual toxic text classifier which adopts a novel fus...
ver más
|
|
|
|
|
|
|
Hanafi bin Dollah,Mohd Feham Md Ghalib,Muhammad Sabri bin Sahrir,Rusni Hassan,Abdul Wahab Zakaria,Zakaria Omar
Pág. pp. 145 - 161
The existing use of mobile technology nowadays can be integrated with various forms of learning materials such as electronic books and digital references in a form of dictionary or encyclopaedia. The expansion of Islamic banking practices through various...
ver más
|
|
|
|
|
|
|
Kirill Tyshchuk, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev and Alexander Panchenko
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based ...
ver más
|
|
|
|
|
|
|
Valery Solovyev and Vladimir Ivanov
In a great deal of theoretical and applied cognitive and neurophysiological research, it is essential to have more vocabularies with concreteness/abstractness ratings. Since creating such dictionaries by interviewing informants is labor-intensive, consid...
ver más
|
|
|
|
|
|
|
Sonali Rajesh Shah, Abhishek Kaushik, Shubham Sharma and Janice Shah
YouTube is a boon, and through it people can educate, entertain, and express themselves about various topics. YouTube India currently has millions of active users. As there are millions of active users it can be understood that the data present on the Yo...
ver más
|
|
|
|
|
|
|
Taufik Fuadi Abidin, Amir Mahazir, Muhammad Subianto, Khairul Munadi and Ridha Ferdhiana
During the previous decades, intelligent identification of acronym and expansion pairs from a large corpus has garnered considerable research attention, particularly in the fields of text mining, entity extraction, and information retrieval. Herein, we p...
ver más
|
|
|
|