Abstract

This article reports about an improved modified joint source-channel model that is used to transliterate the Bengali Named Entity (NE) into English and vice versa. A number of alternatives to the improved modified joint source-channel model have also been considered. All the models of the transliteration system have been developed without and with using linguistic knowledge. The Bengali NE is divided into Transliteration Units (TU) with patterns C+M, where C represents a consonant or a vowel or a conjunct and M represents the vowel modifier or matra. An English NE is divided into TUs with patterns C*V*, where C represents a consonant and V represents a vowel. The system learns mappings automatically from the bilingual training set of 25,000 named entities. Aligned transliteration units along with their contexts are automatically derived from this bilingual training set to generate the collocational statistics. This alignment is automatic and implemented in two different ways: without being guided by linguistic knowledge and being guided by linguistic knowledge. The system considers the linguistic knowledge in the form of possible conjuncts and diphthongs in Bengali and their corresponding representations in English. Experimental results of the 5-fold cross validation test has demonstrated that the improved modified joint source-channel model with linguistic knowledge performs the best during Bengali to English transliteration with a Word Agreement Ratio (WAR) of 81.4% and a Transliteration Unit Agreement Ratio (TUAR) of 95.7%. The same model has also performed the best for the 5-fold cross validation test during English to Bengali transliteration with a WAR of 79.5% and a TUAR of 93.8%. Finally, the transliteration models that do not use linguistic knowledge have been evaluated for Hindi to English (H2E) and Telugu to English (T2E) transliterations with the relatively smaller datasets. This shows that the proposed transliteration algorithm works effectively for any pair of languages that share comparable orthography.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call