Abstract

Lack of competent lexical resources is a ubiquitous fact that negatively affects the development of natural language processing tools for not so widely spoken languages. Recently, projects such as Indo WordNet have significantly reduced the scarcity of lexicons for Indian languages. However, their coverage is still a matter of concern. The cost and time incurred are other limiting factors. The reluctance to automate the process of lexicon generation is majorly credited to the poor precision of the generated synsets. In this paper, we strive to tackle these issues by incorporating language-specific knowledge resources which ensures the authenticity of the generated synsets along with the inclusion of endemic words. We propose a corpus-based approach for automated synset generation which visibly improves the quality of the generated synsets. The experiments performed on a manually created dataset of Hindi words provide a precision of 81.56% and an F-measure of more than 72%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.