Abstract

The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.

Highlights

  • The commonly used data driven approaches for developing natural language processing (NLP) tools are currently unusable with under-resourced languages due to data sparsity and this problem might not be resolved in the near future

  • We have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in Grammatical Framework (GF), making it easier to experiment with data driven approaches

  • Development of the Kikamba Computational Grammar is a significant milestone towards the creation of standard Basic Language Resource Kit (BLARK) [14] since it will result in a Morphological analyzer and multilingual translation using the capability of Grammatical Framework

Read more

Summary

Introduction

The commonly used data driven approaches for developing natural language processing (NLP) tools are currently unusable with under-resourced languages due to data sparsity and this problem might not be resolved in the near future. Development of the Kikamba Computational Grammar is a significant milestone towards the creation of standard Basic Language Resource Kit (BLARK) [14] since it will result in a Morphological analyzer and multilingual translation using the capability of Grammatical Framework. It will be a catalyst to the provision of information and communication technology (ICT) in Kikamba language, bridging the digital divide It will provide a platform for the generation of parallel corpora and treebanks, which are crucial for building NLP tools using data driven approaches. It is an electronic preservation effort for the Kikamba language so that the Kamba people are not disenfranchised in the global information space

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call