Abstract

AbstractThe paper uses a hierarchical phrase-based model to develop Statistical Machine Translation (SMT) Systems for four low resourced South Asian languages. South Asian languages predominantly use traditional statistical and neural machine approaches to translate into another language (mainly English). However, translation accuracy is not much higher as South Asian languages lack in necessary natural language resources and tools; hence classified as low resourced languages. Any SMT system needs large parallel corpora for actual performance. So, the non-availability of corpora constraints the success in machine translation of those languages. Another reason for poor translation quality is grammatical differences between South Asian languages and English: morphological richness and different sentence structure. But traditional SMT systems use the default distortion reordering model to reorder the sentences independent of their context. To overcome this problem, hierarchical phrase model translation, which uses grammar rules formed by the Synchronous Context-Free Grammar, is proposed. This paper considers English to Tamil, Tamil to English, Malayalam to English, English to Malayalam, Tamil to Sinhala and Sinhala to Tamil translations. In the end, we evaluate the system using BLEU as the evaluation metric. The hierarchical phrase-based model shows better results compared to the traditional approach between Tamil-English and Malayalam-English pairs. For Sinhala to Tamil, it achieves 11.18 and 10.73 for vice-versa.KeywordsHierarchical phrase-based modelStatistical machine translationParallel corpusNatural language processing

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call