Abstract

Splitting is a conventional process in most of Indian languages according to their grammar rules. It is called ‘pada vicchEdanam’ (a Sanskrit term for word splitting) and is widely used by most of the Indian languages. Splitting plays a key role in Machine Translation (MT) particularly when the source language (SL) is an Indian language. Though this splitting may not succeed completely in extracting the root words of which the compound is formed, but it shows considerable impact in Natural Language Processing (NLP) as an important phase. Though there are many types of splitting, this paper considers only consonant based and phrase based splitting.

Highlights

  • Combining / conjunction of two or more words to form bigrams or n-grams is a conventional process in Indian languages which plays an important role [1], for instance, ‘vibhakti’ attachment to a root word that can be noun, pronoun, verb, etc

  • ‘valana’ is inflection and relates strictly with ‘rAmuDi’. In case this sentence is written as ‘rAvANuDi valana rAmuDi cAvu’ and the meaning is drastically changed. All these examples conclude that when the inflections are properly attached to appropriate root words, the word order cannot be an obligation, or else word order changes the meaning in a wrong direction and may render the sentence meaningless

  • 3) AmrEDita sandhi: This ‘sandhi’ involves the consonant ‘TT’ as a result in compound like dviruktaTakAra sandhi. This ‘sandhi’ forms compounds in three types but this paper considers only one type as the second type does not involve consonant as a result and third type is more ambiguous and the nature is unidentifiable which is discussed as a special issue in this concept. (Table 12)

Read more

Summary

INTRODUCTION

Combining / conjunction of two or more words to form bigrams or n-grams is a conventional process in Indian languages which plays an important role [1], for instance, ‘vibhakti’ (inflection) attachment to a root word that can be noun, pronoun, verb, etc. If these two are not combined, word order affects the sentence considerably and may change the meaning or become meaningless. This can be written as 'pAmu kRshNuDu tO ADutunnADu’ (absurd meaning) Another example is ‘rAmuDi valana rAvANuDi cAvu’ (rAvaNa’s death is due to rAma). In case this sentence is written as ‘rAvANuDi valana rAmuDi cAvu’ (rAma’s death is due to rAvaNa) and the meaning is drastically changed. All these examples conclude that when the inflections are properly attached to appropriate root words, the word order cannot be an obligation, or else word order changes the meaning in a wrong direction and may render the sentence meaningless. As a consequence, splitting of those compounds is a mandatory step in MT to improve ease as well as accuracy in translation

ISSUES IN CONJUNCTION AND SPLITTING
SANDHIS AS AN AID FOR SPLITTING
CONSONANT BASED SPLITTING
PHRASE BASED SPLITTING
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.