Big Data and Machine Learning for Evaluating Machine Translation

Rashmi Agrawal,Simran Kaur Jolly

doi:10.1007/978-3-030-39119-5_2

Rashmi Agrawal, Simran Kaur Jolly

https://doi.org/10.1007/978-3-030-39119-5_2

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Human Evaluation of Machine Translation is the most important aspect of improving accuracy of translation output which can be used for text categorization ahead. In this article we describe approach of text classification based on parallel corpora and natural language processing techniques. A text classifier is built on multilingual texts by translating different features of the model using the Expectation Maximization Algorithm. Cross-lingual text classification is the process of classifying text into different languages during translation by using training data. The main idea underlying this mechanism is using training data from parallel corpus and applying classification algorithms for reducing the distortion and alignment errors in Machine translation. In this chapter a Classification Model is trained which directs source language to target language on the basis of translation knowledge and parameters defined. The Algorithm adopted here is Expectation Maximization Algorithm which removes ambiguity in parallel corpora by aligning source sentence to target sentence. It considers possible translations from source to target language and selects the one that fits the model on the basis of BLEU (bilingual evaluation understudy) score. The only requirement of this learning is unlabelled data in the target language. The algorithm can be evaluated accurately by running a separate classifier on different parallel corpora. We use Monolingual Corpora and Machine Translation in our study to see the effect of both the models on our parallel corpora.

Full Text