Topic modelling to support English text selection for translation into South Africa's other official languages

Jocelyn Mazarura,Febe De Wet

doi:10.55492/dhasa.v4i01.4447

Abstract

Appropriate training data is a prerequisite for the development of natural language processing (NLP) techniques. Vast amounts of language data are typically required to develop NLP tools that perform at state-of-the-art level. Such abundant resources are currently only available in a few languages. The remaining languages have to find alternative ways to become ``NLP-enabled''. The aim of the study reported on here is to make more language data available to support NLP development in the official languages of South Africa. In this paper we present the idea of generating text data by means of translation. We also propose the use of topic modelling to identify text in a highly resourced source language that will yield meaningful translations in under-resourced target languages. More specifically, the paper describes how topic modelling was used to identify English Wikipedia articles that should be suitable for translation into South Africa's 10 other official languages.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Topic modelling to support English text selection for translation into South Africa's other official languages

Abstract

Talk to us

Similar Papers

More From: Journal of the Digital Humanities Association of Southern Africa (DHASA)

Lead the way for us

Journal: Journal of the Digital Humanities Association of Southern Africa (DHASA)	Publication Date: Jan 26, 2023
License type: cc-by-sa

Similar Papers

Detecting Unknown Malware from ASCII Strings with Natural Language Processing Techniques
Ryo Ito ... Mamoru Mimura
-
Ryo Ito, et. al.Ryo Ito ... Mamoru Mimura
01 Aug 2019
01 Aug 2019

Applying NLP techniques to malware detection in a practical environment
Mamoru Mimura ... Ryo Ito
International journal of information security | VOL. 21
Mamoru Mimura, et. al.Mamoru Mimura ... Ryo Ito
06 Jun 2021
International journal of information security | VOL. 21

Personal Data in Artificial Intelligence Systems: Natural Language Processing Technology
I G Ilin
Journal of Digital Technologies and Law | VOL. 2
I G IlinI G Ilin
20 Mar 2024
Journal of Digital Technologies and Law | VOL. 2

AraPathogen2.0: An Improved Prediction of Plant-Pathogen Protein-Protein Interactions Empowered by the Natural Language Processing Technique.
Chenping Lei ... Miao Zhao
Journal of Proteome Research | VOL. 23
Chenping Lei, et. al.Chenping Lei ... Miao Zhao
09 Dec 2023
Journal of Proteome Research | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Topic modelling to support English text selection for translation into South Africa's other official languages

Abstract

Talk to us

Similar Papers

More From: Journal of the Digital Humanities Association of Southern Africa (DHASA)