Abstract

The discussion in this paper is related to extracting a single lemma from different morphological variants related to a particular dictionary root word. The existing popular online lemmatizers like the Stanford LemmaProcessor, Spacy Lemmatizer, LemmaGen, MorphAdorner, etc. generate the correct lemmas for all singular-plural nouns and all verbal words existing in different tenses, but all these lemmatizers are not able to derive the correct lemma for any type of derived words; specially for nominalized derived words. The proposed lemmatizer – ‘LemmaQuest’ is designed and implemented to overcome these limitations. LemmaQuest first creates distinct groups for all allied morphed words like singular-plural nouns, verbs in all tenses, and nominalized words. These groups are created based on a combination of different statistical distance measures considering all possible pairs of input words. After that, lemmas are generated for each group. The main objective of this proposed model is to extract the correct lemma for a set of a large number of input words in an optimized time, which leads to a vast improvement in text simplification, keyword extraction, text summarization and other text mining applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.