LemmaQuest Lemmatizer: A Morphological Analyzer Handling Nominalization

Rupam Gupta,Anjali G Jivani

doi:10.1080/03772063.2021.2013328

Abstract

The discussion in this paper is related to extracting a single lemma from different morphological variants related to a particular dictionary root word. The existing popular online lemmatizers like the Stanford LemmaProcessor, Spacy Lemmatizer, LemmaGen, MorphAdorner, etc. generate the correct lemmas for all singular-plural nouns and all verbal words existing in different tenses, but all these lemmatizers are not able to derive the correct lemma for any type of derived words; specially for nominalized derived words. The proposed lemmatizer – ‘LemmaQuest’ is designed and implemented to overcome these limitations. LemmaQuest first creates distinct groups for all allied morphed words like singular-plural nouns, verbs in all tenses, and nominalized words. These groups are created based on a combination of different statistical distance measures considering all possible pairs of input words. After that, lemmas are generated for each group. The main objective of this proposed model is to extract the correct lemma for a set of a large number of input words in an optimized time, which leads to a vast improvement in text simplification, keyword extraction, text summarization and other text mining applications.

Full Text