AbstractThis paper describes the development of a transformer-based text generation model for Nigerian Pidgin also known as Naijá, a popular language in West Africa. Despite its wide use, Nigerian Pidgin remains under-resourced, particularly in areas related to text generation and natural language processing. These difficulties are primarily due to technological constraints rather than the language’s fundamental attributes. There is currently a demand for Nigerian Pidgin-specific solutions because it is used in everyday communication and has a unique linguistic blend. This paper aims to close this gap by exploring the application of state-of-the-art transformer technology to develop a text generation model for Nigerian Pidgin. This work uses the public Afriberta-corpus dataset to optimize the Generative Pre-trained Transformer (GPT-2) model across a sizeable dataset. The performance evaluators, BLEU and Perplexity metrics provide a detailed breakdown of the model’s text quality and predictive accuracy. Despite the difficulties caused by a limited amount of training data, preliminary evaluations show that the model can generate coherent Nigerian Pidgin text. The performance evaluation yielded perplexity scores of 43.56 for variable target reference length and 43.26 for fixed text length. BLEU scores of 0.15 for fixed max length and 0.56 for variable reference target length. This highlights the quality of generated text and the significant improvement when the generated text length is aligned with the reference target. Our work was benchmarked against African American Vernacular (AAVE) revealing that BLEU scores for AAVE are significantly lower than those for Standard American English, with BLEU given as 0.26. Our Nigerian Pidgin model, with a BLEU score of 0.56, shows a better performance. However, both results suggest that both dialects are challenging for language models. Leveraging the pre-trained transformer-based language model and evaluation metrics, we showcase the model’s capacity for coherent Nigerian Pidgin text generation. For future research, the research work can serve as a good foundation for advancement and progress in the Nigerian Pidgin language generation and other low-resource languages.
Read full abstract