Strategies for building wordnets for under-resourced languages: The case of African languages

Sonja E Bosch,Marissa Griesel

doi:10.4102/lit.v38i1.1351

Abstract

The African Wordnet Project (AWN) aims at building wordnets for five African languages: Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (also referred to as Sepedi or Northern Sotho) and Tshivenda. Currently, the so-called expand model, based on the structure of the English Princeton WordNet (PWN), is used to continually develop the African Wordnets manually. This is a labour-intensive work that needs to be performed by linguistic experts, guided by several considerations such as the level of lexicalisation of a term in the African language. Up to now, linguists were responsible for identifying and translating appropriate synsets without much help from electronic resources because in the case of African languages even basic resources such as computer readable and electronic bilingual wordlists are usually not freely available. Methods to speed up the manual development of synsets and ease the workload of the human language experts were recently investigated. These centred around utilising the minimal amount of information available in bilingual dictionaries to identify synsets in the PWN that should be included in the AWN, transferring information from dictionaries to the wordnet and presenting the potential synsets to linguists for final approval and inclusion in the wordnets. In this article, we describe the methodology developed for building the African Wordnets, a potentially significant resource for natural language processing applications. Available resources that could be taken advantage of and resources that had to be developed are investigated, and initial results and future plans are explained.

Highlights

A wordnet is an electronic lexical database consisting of words that are grouped into sets of synonyms called synsets and linked by conceptual-semantic and lexical relations (Miller 1995)
Much has been written about the resource scarceness of the African languages
Wordnets aim to serve as direct sources of data for further human language technology and linguistic research, and to create more intricate resources such as semantically tagged corpora, information retrieval systems and the like

Summary

Introduction

Introduction and aimsA wordnet is an electronic lexical database consisting of words that are grouped into sets of synonyms called synsets and linked by conceptual-semantic and lexical relations (Miller 1995). The interlinked synsets form an extensive semantic network, the digital format of which allows both manual and automatic searches for words that are meaningfully related to one another. In this regard, Fellbaum (1998:7) explains: http://www.literator.org.za Open Access. Van der Spuy and Flach (2010:1022) point out that the complex morphology of isiZulu is a challenge in particular for computational analysis, because: Words usually incorporate both prefixes and suffixes, and there can be several of each. The complexities involved are exacerbated by the fact that a considerable number of affixes, especially prefixes, have allomorphic forms

Objectives

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Literator	Publication Date: Mar 31, 2017
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Strategies for building wordnets for under-resourced languages: The case of African languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Literator

Lead the way for us

Similar Papers

Requests in a South African variety of English
Luanga A Kasanga
World Englishes | VOL. 25
Luanga A KasangaLuanga A Kasanga
01 Feb 2006
World Englishes | VOL. 25

The onomastic possibility of renaming the Sepedi and Sesotho sa Leboa (Northern Sotho) language names to restore peace, dignity and solidarity
Tebogo J Rakgogo ... Evangeline B Zungu
Literator | VOL. 42
Tebogo J Rakgogo, et. al.Tebogo J Rakgogo ... Evangeline B Zungu
26 Aug 2021
Literator | VOL. 42

Borrowing and Loan Words: The Lemmatizing of Newly Acquired Lexical Items in Sesotho sa Leboa
V.M Mojela
Lexikos | VOL. 20
V.M MojelaV.M Mojela
13 Dec 2010
Lexikos | VOL. 20

Word-formation strategies and processes in the creation of synsets for the African wordnet
Stanley Madonsela
Southern African Linguistics and Applied Language Studies | VOL. 35
Stanley MadonselaStanley Madonsela
03 Apr 2017
Southern African Linguistics and Applied Language Studies | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Strategies for building wordnets for under-resourced languages: The case of African languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Literator