Abstract

Indian languages belong to four language families, namely, the Indo-Aryan, Dravidian, Tibeto-Burman and the Austro- Asiatic. Hindi and Kannada belong to Indo-Aryan and Dravidian family respectively and are evolved from the ancient Brahmi script and have a common phonetic structure. But the Named Entity writing convention is different due to dialectic influence, language specific rules, and other factors. Due to this, the Named Entity Transliteration from Hindi to Kannada and vice versa is not one to one character mapping. This introduces many problems in Machine Translation (MT), Cross Lingual Information Retrieval (CLIR) and Parallel corpus creation between Hindi and Kannada. The paper discusses the Named Entity Transliteration issues encountered between Hindi and Kannada during the parallel corpora creation from Hindi to Kannada for the Indian Language Corpus Initiative (ILCI) project. In this paper, we discuss cases of no exact equivalence character between Hindi and Kannada, multiple mappings, diacritic marks, loan words and language specific transliteration issues in detail and propose the possible solution to resolve the problem. At implementation level, one may make use of either Finite-State Transducers (FST) or Regular Expressions

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.