Urdu to Punjabi Machine Translation: An Incremental Training Approach

Umrinderpal Singh,Gurpreet Singh,Vishal Goyal

doi:10.14569/ijacsa.2016.070428

Abstract

The statistical machine translation approach is highly popular in automatic translation research area and promising approach to yield good accuracy. Efforts have been made to develop Urdu to Punjabi statistical machine translation system. The system is based on an incremental training approach to train the statistical model. In place of the parallel sentences corpus has manually mapped phrases which were used to train the model. In preprocessing phase, various rules were used for tokenization and segmentation processes. Along with these rules, text classification system was implemented to classify input text to predefined classes and decoder translates given text according to selected domain by the text classifier. The system used Hidden Markov Model(HMM) for the learning process and Viterbi algorithm has been used for decoding. Experiment and evaluation have shown that simple statistical model like HMM yields good accuracy for a closely related language pair like Urdu-Punjabi. The system has achieved 0.86 BLEU score and in manual testing and got more than 85% accuracy.

Highlights

The machine translation is a burning topic in the area of artificial intelligence
There are many machine translation systems which have been developed for Indo-Aryan languages [Garje G V, 2013]
Resource poor languages: Urdu and Punjabi languages are new in natural language processing area like any other Indo-Aryan language

Summary

INTRODUCTION

The machine translation is a burning topic in the area of artificial intelligence In this digital era where across the world different communities are connected to each other and sharing a vast amount of resources. In this kind of digital environment, different natural languages are the main obstacle to communicate. Various kinds of approaches have been developed to decode natural languages like Rule based, Example-based, Statistical and various hybrid approaches. Among all these approaches, statistical based approach is a quite dominant and popular in the machine translation research community. Collecting parallel phrases were more convenient as compared to the parallel sentences

URDU AND PUNJABI: A CLOSELY RELATED LANGUAGE PAIR

Resource poor languages

Spelling variation

Free word order

Segmentation issues in Urdu

Morphological rich languages

Word without diacritical marks

METHODOLOGY

Tokenization and segmentation process

Text Classification

Translation and Language model Training

EXPERIMENT AND EVALUATION

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2016
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Urdu to Punjabi Machine Translation: An Incremental Training Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Statistical Machine Translation Pada Bahasa Lampung Dialek Api Ke Bahasa Indonesia
Permata Permata ... Zaenal Abidin
Jurnal media informatika Budidarma | VOL. 4
Permata Permata, et. al.Permata Permata ... Zaenal Abidin
20 Jul 2020
Jurnal media informatika Budidarma | VOL. 4

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
Nghia-Luan Pham ... Van-Vinh Nguyen
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Nghia-Luan Pham, et. al.Nghia-Luan Pham ... Van-Vinh Nguyen
30 May 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

Discourse-level Features for Statistical Machine Translation

-

01 Jan 2015
01 Jan 2015

A Pragmatic Analysis of Machine Translation Techniques for Preserving the Authenticity of the Sanskrit Language
Nandini Sethi ... Amita Dev
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -
Nandini Sethi, et. al.Nandini Sethi ... Amita Dev
25 Jul 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Urdu to Punjabi Machine Translation: An Incremental Training Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications