Abstract
The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.
Highlights
The commonly used data driven approaches for developing natural language processing (NLP) tools are currently unusable with under-resourced languages due to data sparsity and this problem might not be resolved in the near future
We have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in Grammatical Framework (GF), making it easier to experiment with data driven approaches
Development of the Kikamba Computational Grammar is a significant milestone towards the creation of standard Basic Language Resource Kit (BLARK) [14] since it will result in a Morphological analyzer and multilingual translation using the capability of Grammatical Framework
Summary
The commonly used data driven approaches for developing natural language processing (NLP) tools are currently unusable with under-resourced languages due to data sparsity and this problem might not be resolved in the near future. Development of the Kikamba Computational Grammar is a significant milestone towards the creation of standard Basic Language Resource Kit (BLARK) [14] since it will result in a Morphological analyzer and multilingual translation using the capability of Grammatical Framework. It will be a catalyst to the provision of information and communication technology (ICT) in Kikamba language, bridging the digital divide It will provide a platform for the generation of parallel corpora and treebanks, which are crucial for building NLP tools using data driven approaches. It is an electronic preservation effort for the Kikamba language so that the Kamba people are not disenfranchised in the global information space
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have