Abstract
Human Evaluation of Machine Translation is the most important aspect of improving accuracy of translation output which can be used for text categorization ahead. In this article we describe approach of text classification based on parallel corpora and natural language processing techniques. A text classifier is built on multilingual texts by translating different features of the model using the Expectation Maximization Algorithm. Cross-lingual text classification is the process of classifying text into different languages during translation by using training data. The main idea underlying this mechanism is using training data from parallel corpus and applying classification algorithms for reducing the distortion and alignment errors in Machine translation. In this chapter a Classification Model is trained which directs source language to target language on the basis of translation knowledge and parameters defined. The Algorithm adopted here is Expectation Maximization Algorithm which removes ambiguity in parallel corpora by aligning source sentence to target sentence. It considers possible translations from source to target language and selects the one that fits the model on the basis of BLEU (bilingual evaluation understudy) score. The only requirement of this learning is unlabelled data in the target language. The algorithm can be evaluated accurately by running a separate classifier on different parallel corpora. We use Monolingual Corpora and Machine Translation in our study to see the effect of both the models on our parallel corpora.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have