21 Artículos

« Anterior Página: 1 de 2 Siguiente »

Analyzing Indo-European Language Similarities Using Document Vectors

Acceso

en línea

Samuel R. Schrader and Eren Gultepe

The evaluation of similarities between natural languages often relies on prior knowledge of the languages being studied. We describe three methods for building phylogenetic trees and clustering languages without the use of language-specific information. ... ver más

Revista: Informatics Formato: Electrónico

Tabla de contenido: Vol: 10 Num: 0 Par: 4 Año: 2023

An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India

Acceso

en línea

Tharindu Ranasinghe and Marcos Zampieri

The pervasiveness of offensive content in social media has become an important reason for concern for online platforms. With the aim of improving online safety, a large number of studies applying computational models to identify such content have been pu... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 12 Num: 0 Par: 8 Año: 2021

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque?Spanish ASR

Acceso

en línea

Mikel Penagarikano, Amparo Varona, Germán Bordel and Luis Javier Rodriguez-Fuentes

In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an... ver más

Revista: Applied Sciences Formato: Electrónico

Tabla de contenido: Vol: 13 Num: 0 Par: 14 Año: 2023

Near-Optimal Active Learning for Multilingual Grapheme-to-Phoneme Conversion

Acceso

en línea

Dezhi Cao, Yue Zhao and Licheng Wu

The construction of pronunciation dictionaries relies on high-quality and extensive training data in data-driven way. However, the manual annotation of corpus for this purpose is both costly and time consuming, especially for low-resource languages that ... ver más

Revista: Applied Sciences Formato: Electrónico

Tabla de contenido: Vol: 13 Num: 0 Par: 16 Año: 2023

Multilingual Transformer-Based Personality Traits Estimation

Acceso

en línea

Simone Leonardi, Diego Monti, Giuseppe Rizzo and Maurizio Morisio

Intelligent agents have the potential to understand personality traits of human beings because of their every day interaction with us. The assessment of our psychological traits is a useful tool when we require them to simulate empathy. Since the creatio... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 11 Num: 0 Par: 4 Año: 2020

Motivation of Parents Towards Reading Multilingual eBooks To Pre-School Children

Acceso

en línea

Kwee Teck See,Bava Harji Madhubala,Ah Choo Koo Pág. pp. 20 - 36

The use of mobile devices for language learning, under the Mobile Assisted Language Learning (MALL) has been found to motivate children to read digital print. However, parents need to be convinced of the benefits of this new technology-assisted learning ... ver más

Revista: International Journal of Interactive Mobile Technologies (iJIM) Formato: Electrónico

Tabla de contenido: Vol: 13 Num: 01 Par: 0 Año: 2019

Multilingual Speech Recognition for Turkic Languages

Acceso

en línea

Saida Mussakhojayeva, Kaisar Dauletbek, Rustem Yeshpanov and Huseyin Atakan Varol

The primary aim of this study was to contribute to the development of multilingual automatic speech recognition for lower-resourced Turkic languages. Ten languages?Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek?we... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 14 Num: 0 Par: 2 Año: 2023

Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets

Acceso

en línea

Seid Muhie Yimam, Abinew Ali Ayele, Gopalakrishnan Venkatesh, Ibrahim Gashaw and Chris Biemann

The availability of different pre-trained semantic models has enabled the quick development of machine learning components for downstream applications. However, even if texts are abundant for low-resource languages, there are very few semantic models pub... ver más

Revista: Future Internet Formato: Electrónico

Tabla de contenido: Vol: 13 Num: 0 Par: 11 Año: 2021

Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method

Acceso

en línea

Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng

Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and stat... ver más

Revista: Applied Sciences Formato: Electrónico

Tabla de contenido: Vol: 14 Num: 0 Par: 7 Año: 2024

Chinese?Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion

Acceso

en línea

Yonghua Wen, Junjun Guo, Zhiqiang Yu and Zhengtao Yu

Parallel sentences play a crucial role in various NLP tasks, particularly for cross-lingual tasks such as machine translation. However, due to the time-consuming and laborious nature of manual construction, many low-resource languages still suffer from a... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 14 Num: 0 Par: 5 Año: 2023

Investigation of Spoken-Language Detection and Classification in Broadcasted Audio Content

Acceso

en línea

Rigas Kotsakis, Maria Matsiola, George Kalliris and Charalampos Dimoulas

The current paper focuses on the investigation of spoken-language classification in audio broadcasting content. The approach reflects a real-word scenario, encountered in modern media/monitoring organizations, where semi-automated indexing/documentation ... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 11 Num: 0 Par: 4 Año: 2020

Using Multiple Monolingual Models for Efficiently Embedding Korean and English Conversational Sentences

Acceso

en línea

Youngki Park and Youhyun Shin

This paper presents a novel approach for finding the most semantically similar conversational sentences in Korean and English. Our method involves training separate embedding models for each language and using a hybrid algorithm that selects the appropri... ver más

Revista: Applied Sciences Formato: Electrónico

Tabla de contenido: Vol: 13 Num: 0 Par: 9 Año: 2023

Natural Language Processing (NLP) in Aviation Safety: Systematic Review of Research and Outlook into the Future

Acceso

en línea

Chuyang Yang and Chenyu Huang

Advanced digital data-driven applications have evolved and significantly impacted the transportation sector in recent years. This systematic review examines natural language processing (NLP) approaches applied to aviation safety-related domains. The auth... ver más

Revista: Aerospace Formato: Electrónico

Tabla de contenido: Vol: 10 Num: 0 Par: 7 Año: 2023

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Acceso

en línea

Roberta Rodrigues de Lima, Anita M. R. Fernandes, James Roberto Bombasar, Bruno Alves da Silva, Paul Crocker and Valderi Reis Quietinho Leithardt

Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the comple... ver más

Revista: Big Data and Cognitive Computing Formato: Electrónico

Tabla de contenido: Vol: 6 Num: 0 Par: 1 Año: 2022

A Study of Multilingual Toxic Text Detection Approaches under Imbalanced Sample Distribution

Acceso

en línea

Guizhe Song, Degen Huang and Zhifeng Xiao

Multilingual characteristics, lack of annotated data, and imbalanced sample distribution are the three main challenges for toxic comment analysis in a multilingual setting. This paper proposes a multilingual toxic text classifier which adopts a novel fus... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 12 Num: 0 Par: 5 Año: 2021

Prototype Development of Mobile App for Trilingual Islamic Banking and Finance Glossary of Terms via iOS and Android Based Devices

Acceso

en línea

Hanafi bin Dollah,Mohd Feham Md Ghalib,Muhammad Sabri bin Sahrir,Rusni Hassan,Abdul Wahab Zakaria,Zakaria Omar Pág. pp. 145 - 161

The existing use of mobile technology nowadays can be integrated with various forms of learning materials such as electronic books and digital references in a form of dictionary or encyclopaedia. The expansion of Islamic banking practices through various... ver más

Revista: International Journal of Interactive Mobile Technologies (iJIM) Formato: Electrónico

Tabla de contenido: Vol: 11 Num: 3 Par: 0 Año: 2017

On Isotropy of Multimodal Embeddings

Acceso

en línea

Kirill Tyshchuk, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev and Alexander Panchenko

Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based ... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 14 Num: 0 Par: 7 Año: 2023

Extrapolation of Human Estimates of the Concreteness/ Abstractness of Words by Neural Networks of Various Architectures

Acceso

en línea

Valery Solovyev and Vladimir Ivanov

In a great deal of theoretical and applied cognitive and neurophysiological research, it is essential to have more vocabularies with concreteness/abstractness ratings. Since creating such dictionaries by interviewing informants is labor-intensive, consid... ver más

Revista: Applied Sciences Formato: Electrónico

Tabla de contenido: Vol: 12 Num: 0 Par: 9 Año: 2022

Opinion-Mining on Marglish and Devanagari Comments of YouTube Cookery Channels Using Parametric and Non-Parametric Learning Models

Acceso

en línea

Sonali Rajesh Shah, Abhishek Kaushik, Shubham Sharma and Janice Shah

YouTube is a boon, and through it people can educate, entertain, and express themselves about various topics. YouTube India currently has millions of active users. As there are millions of active users it can be understood that the data present on the Yo... ver más

Revista: Big Data and Cognitive Computing Formato: Electrónico

Tabla de contenido: Vol: 4 Num: 0 Par: 1 Año: 2020

Recognizing Indonesian Acronym and Expansion Pairs with Supervised Learning and MapReduce

Acceso

en línea

Taufik Fuadi Abidin, Amir Mahazir, Muhammad Subianto, Khairul Munadi and Ridha Ferdhiana

During the previous decades, intelligent identification of acronym and expansion pairs from a large corpus has garnered considerable research attention, particularly in the fields of text mining, entity extraction, and information retrieval. Herein, we p... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 11 Num: 0 Par: 4 Año: 2020

« Anterior Página: 1 de 2 Siguiente »