Abstract
The statistical machine translation approach is highly popular in automatic translation research area and promising approach to yield good accuracy. Efforts have been made to develop Urdu to Punjabi statistical machine translation system. The system is based on an incremental training approach to train the statistical model. In place of the parallel sentences corpus has manually mapped phrases which were used to train the model. In preprocessing phase, various rules were used for tokenization and segmentation processes. Along with these rules, text classification system was implemented to classify input text to predefined classes and decoder translates given text according to selected domain by the text classifier. The system used Hidden Markov Model(HMM) for the learning process and Viterbi algorithm has been used for decoding. Experiment and evaluation have shown that simple statistical model like HMM yields good accuracy for a closely related language pair like Urdu-Punjabi. The system has achieved 0.86 BLEU score and in manual testing and got more than 85% accuracy.
Highlights
The machine translation is a burning topic in the area of artificial intelligence
There are many machine translation systems which have been developed for Indo-Aryan languages [Garje G V, 2013]
Resource poor languages: Urdu and Punjabi languages are new in natural language processing area like any other Indo-Aryan language
Summary
The machine translation is a burning topic in the area of artificial intelligence In this digital era where across the world different communities are connected to each other and sharing a vast amount of resources. In this kind of digital environment, different natural languages are the main obstacle to communicate. Various kinds of approaches have been developed to decode natural languages like Rule based, Example-based, Statistical and various hybrid approaches. Among all these approaches, statistical based approach is a quite dominant and popular in the machine translation research community. Collecting parallel phrases were more convenient as compared to the parallel sentences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have