Ensemble and Deep Learning for Language-Independent Automatic Selection of Parallel Data

Despoina Mouratidis and Katia Lida Kermanidis

Resumen

Machine translation is used in many applications in everyday life. Due to the increase of translated documents that need to be organized as useful or not (for building a translation model), the automated categorization of texts (classification), is a popular research field of machine learning. This kind of information can be quite helpful for machine translation. Our parallel corpora (English-Greek and English-Italian) are based on educational data, which are quite difficult to translate. We apply two state of the art architectures, Random Forest (RF) and Deeplearnig4j (DL4J), to our data (which constitute three translation outputs). To our knowledge, this is the first time that deep learning architectures are applied to the automatic selection of parallel data. We also propose new string-based features that seem to be effective for the classifier, and we investigate whether an attribute selection method could be used for better classification accuracy. Experimental results indicate an increase of up to 4% (compared to our previous work) using RF and rather satisfactory results using DL4J.

Palabras claves

machine learning - deep learning - education data - data selection - machine translation - DL4J deep learning architecture - random forest

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 1 (2019)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

DOI

https://doi.org/10.3390/a12010026

Ensemble and Deep Learning for Language-Independent Automatic Selection of Parallel Data

Artículos similares

Revistas destacadas