Abstract
Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary compilation. Keywords: LEXICOGRAPHY, MORPHOLOGY, CORPUS ANNOTATION, LEMMATIZATION, MACHINE LEARNING, SWAHILI (KISWAHILI)
Highlights
Samenvatting: Accuratere computationele morfologische analyse van een Swahili corpus voor lexicografische doeleinden
In De Schryver and De Pauw (2007) it was shown how the fields of natural language processing (NLP) and lexicography can collaborate towards enhancing the functionality of a corpus query package (CQP), by integrating a fast and accurate data-driven part-ofspeech (POS) tagger
We investigate how another typical NLP component — namely morphological analysis — can be developed with a minimal amount of manual effort, and demonstrate how it can be used as a CQP component
Summary
Samenvatting: Accuratere computationele morfologische analyse van een Swahili corpus voor lexicografische doeleinden. Through this operation we can automatically induce a morphologically segmented surface and lexical representation of the word form, in which we distinguish a prefix group ([P]), the root morpheme ([R]) and a suffix group ([S]).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.