Innovatively Fused Deep Learning with Limited Noisy Data for Evaluating Translations from Poor into Rich Morphology

Despoina Mouratidis,Vilelmini Sosoni,Katia Lida Kermanidis

doi:10.3390/app11020639

Despoina Mouratidis, Vilelmini Sosoni + Show 1 more

Open Access

https://doi.org/10.3390/app11020639

Copy DOI

Journal: Applied Sciences	Publication Date: Jan 11, 2021
Citations: 1	License type: CC BY 4.0

Affiliation: Ionian University

Abstract

Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language processing (NLP) metrics and embeddings), by using a model for machine learning based on noisy and small datasets. The linguistic features are string based for the language pairs English (EN)–Greek (EL) and EN–Italian (IT). The paper also explores the linguistic differences that affect evaluation accuracy between different kinds of corpora. A comparative study between using a simple embedding layer (mathematically calculated) and pre-trained embeddings is conducted. Moreover, an analysis of the impact of feature selection and dimensionality reduction on classification accuracy has been conducted. Results show that using a neural network (NN) model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation for EN–EL and EN–IT, by an increase of almost 0.40 points in correlation with human judgments on pairwise MT evaluation. It is observed that the proposed algorithm achieved better results on noisy and small datasets. In addition, for a more integrated analysis of the accuracy results, a qualitative linguistic analysis has been carried out in order to address complex linguistic phenomena.

Highlights

Machine translation (MT) applications have nowadays infiltrated almost every aspect of everyday activities
In this experiment (a) we investigate whether the predicted classifications have any correlation with human annotation, (b) we compare the proposed classification mechanism against the baseline classification models for small noisy and formal datasets respectively, (c) we compare two different ways of generating the embedding layer, and (d) we test two different options of validation methods
It is more difficult for the classifier to choose the best MT output, because the Statistical Machine Translation (SMT) output is more similar to the NMT output in this corpus (C2)

Summary

Introduction

Machine translation (MT) applications have nowadays infiltrated almost every aspect of everyday activities. Over the past few years, neural network (NN) models have improved the state-of-the-art of different natural language processing (NLP) applications [1], such as language modeling [2,3], improving answer ranking in community question answering [4], improving translation modeling [5,6,7], as well as evaluating machine translation output [4,8,9]. Word2vec became quickly the dominant approach for vectorizing textual data. The NLP models that were already well studied based on traditional approaches, such as latent semantic indexing (LSI) and vector representations using term frequency–inverse document frequency (TF-IDF) weighting, have been tested against word embeddings and, in most cases, word embeddings have come out on top. The research focus has shifted towards embedding approaches

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Innovatively Fused Deep Learning with Limited Noisy Data for Evaluating Translations from Poor into Rich Morphology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Incorporating Machine Learning Techniques in MT Evaluation
Nisheeth Joshi ... Hemant Darbari
-
Nisheeth Joshi, et. al.Nisheeth Joshi ... Hemant Darbari
01 Jan 2015
01 Jan 2015

Unsupervised quality estimation model for English to German translation and its application in extensive supervised evaluation.
Aaron L.-F Han ... Liangye He
TheScientificWorldJournal | VOL. 2014
Aaron L.-F Han, et. al.Aaron L.-F Han ... Liangye He
01 Jan 2014
TheScientificWorldJournal | VOL. 2014

A Naïve Automatic MT Evaluation Method without Reference Translations
Junjie Jiang ... Youfang Lin
-
Junjie Jiang, et. al.Junjie Jiang ... Youfang Lin
01 Jan 2010
01 Jan 2010

Innovative Deep Neural Network Fusion for Pairwise Translation Evaluation
Despoina Mouratidis ... Katia Lida Kermanidis
-
Despoina Mouratidis, et. al.Despoina Mouratidis ... Katia Lida Kermanidis
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Innovatively Fused Deep Learning with Limited Noisy Data for Evaluating Translations from Poor into Rich Morphology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences