Abstract

Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language processing (NLP) metrics and embeddings), by using a model for machine learning based on noisy and small datasets. The linguistic features are string based for the language pairs English (EN)–Greek (EL) and EN–Italian (IT). The paper also explores the linguistic differences that affect evaluation accuracy between different kinds of corpora. A comparative study between using a simple embedding layer (mathematically calculated) and pre-trained embeddings is conducted. Moreover, an analysis of the impact of feature selection and dimensionality reduction on classification accuracy has been conducted. Results show that using a neural network (NN) model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation for EN–EL and EN–IT, by an increase of almost 0.40 points in correlation with human judgments on pairwise MT evaluation. It is observed that the proposed algorithm achieved better results on noisy and small datasets. In addition, for a more integrated analysis of the accuracy results, a qualitative linguistic analysis has been carried out in order to address complex linguistic phenomena.

Highlights

  • Machine translation (MT) applications have nowadays infiltrated almost every aspect of everyday activities

  • In this experiment (a) we investigate whether the predicted classifications have any correlation with human annotation, (b) we compare the proposed classification mechanism against the baseline classification models for small noisy and formal datasets respectively, (c) we compare two different ways of generating the embedding layer, and (d) we test two different options of validation methods

  • It is more difficult for the classifier to choose the best MT output, because the Statistical Machine Translation (SMT) output is more similar to the NMT output in this corpus (C2)

Read more

Summary

Introduction

Machine translation (MT) applications have nowadays infiltrated almost every aspect of everyday activities. Over the past few years, neural network (NN) models have improved the state-of-the-art of different natural language processing (NLP) applications [1], such as language modeling [2,3], improving answer ranking in community question answering [4], improving translation modeling [5,6,7], as well as evaluating machine translation output [4,8,9]. Word2vec became quickly the dominant approach for vectorizing textual data. The NLP models that were already well studied based on traditional approaches, such as latent semantic indexing (LSI) and vector representations using term frequency–inverse document frequency (TF-IDF) weighting, have been tested against word embeddings and, in most cases, word embeddings have come out on top. The research focus has shifted towards embedding approaches

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call