Abstract

Since Northern Sotho uses the disjunctive method of writing, it creates difficulties for the morphological analyser to correctly analyse Northern Sotho verbs. In order to overcome this obstacle a tokeniser, which could isolate verbs from raw texts, needs to be created. The verbal element a ka be a se a re šadišetša 'he had not left us' consists, for example, of eight separately written parts which would be difficult to extract from a running text. The tokeniser will prevent over-analysis and unnecessary morphological ambiguity. A morpheme such as se that is not first tokenised could be ambiguously analysed as a subject concord, an object concord, a demonstrative pronoun, a negative marker or an auxiliary verb stem. With tokenisation, this ambiguity is removed as the position of the morpheme in the token allows for more accurate analysis of the morpheme. This article focuses on the description of the verbal segment in current Northern Sotho grammars. The different types of verbal elements are investigated as well as all the verbal prefixes which may form part of the verbal segment. Terminological issues surrounding so-called 'deficient verbs' are addressed and a framework for the design of a tokeniser which provides for all the verbal prefixes is proposed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.