Abstract
The morphological variations of highly inflected languages that appear in a text impede the progress of computer processing and root word determination tasks while extracting an abstract. As a remedy to this difficulty, a lemmatization algorithm is developed, and its effectiveness is evaluated for Word Sense Disambiguation (WSD). Having observed its usefulness, lemmatizer is considered for developing Natural Language Processing tools for languages rich in morphological variations. Among various Indian highly inflected languages, Assamese, spoken by over 14 million people in the North-Eastern region of India, is also one of them. In this present work, after a detailed study on the possible transformations through which surface words are created from lemmas, we have designed an Assamese lemmatizer in such a manner that suitable reverse transformations can be employed on a surface word to derive the co-relative (similar) lemma back. And it has been observed that the lemmatizer is competent to deal with inflectional and derivational morphology in Assamese, and the same was evaluated on various Assamese articles extracted from the Assamese Corpus consisting of 50,000 surface words (excluding proper nouns), and the result that it yielded with 82% accuracy was quite encouraging and satisfying, as Assamese is a low-level language and no research work has been done in the Assamese language regarding the lemmatization of words. Considering the result obtained, the lemmatizer is then evaluated for Assamese WSD. For this purpose, 10 highly polysemous Assamese words are taken into account for sense disambiguation. We have also regarded varied WSD systems and observed that such systems enhance the effectiveness of all the WSD systems, which is statistically significant.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Asian and Low-Resource Language Information Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.