Abstract

Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary compilation. Keywords: LEXICOGRAPHY, MORPHOLOGY, CORPUS ANNOTATION, LEMMATIZATION, MACHINE LEARNING, SWAHILI (KISWAHILI)

Highlights

  • Samenvatting: Accuratere computationele morfologische analyse van een Swahili corpus voor lexicografische doeleinden

  • In De Schryver and De Pauw (2007) it was shown how the fields of natural language processing (NLP) and lexicography can collaborate towards enhancing the functionality of a corpus query package (CQP), by integrating a fast and accurate data-driven part-ofspeech (POS) tagger

  • We investigate how another typical NLP component — namely morphological analysis — can be developed with a minimal amount of manual effort, and demonstrate how it can be used as a CQP component

Read more

Summary

Introduction

Samenvatting: Accuratere computationele morfologische analyse van een Swahili corpus voor lexicografische doeleinden. Through this operation we can automatically induce a morphologically segmented surface and lexical representation of the word form, in which we distinguish a prefix group ([P]), the root morpheme ([R]) and a suffix group ([S]).

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call