Abstract
Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary compilation. Keywords: LEXICOGRAPHY, MORPHOLOGY, CORPUS ANNOTATION, LEMMATIZATION, MACHINE LEARNING, SWAHILI (KISWAHILI)
Highlights
Samenvatting: Accuratere computationele morfologische analyse van een Swahili corpus voor lexicografische doeleinden
In De Schryver and De Pauw (2007) it was shown how the fields of natural language processing (NLP) and lexicography can collaborate towards enhancing the functionality of a corpus query package (CQP), by integrating a fast and accurate data-driven part-ofspeech (POS) tagger
We investigate how another typical NLP component — namely morphological analysis — can be developed with a minimal amount of manual effort, and demonstrate how it can be used as a CQP component
Summary
Samenvatting: Accuratere computationele morfologische analyse van een Swahili corpus voor lexicografische doeleinden. Through this operation we can automatically induce a morphologically segmented surface and lexical representation of the word form, in which we distinguish a prefix group ([P]), the root morpheme ([R]) and a suffix group ([S]).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have