HauWE: Hausa Words Embedding for Natural Language Processing

Idris Abdulmumin,Bashir Shehu Galadanci

doi:10.1109/nigeriacomputconf45974.2019.8949674

Idris Abdulmumin, Bashir Shehu Galadanci

Open Access

PDF Available

https://doi.org/10.1109/nigeriacomputconf45974.2019.8949674

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Words embedding (distributed word vector representations) have become an essential component of many natural language processing (NLP) tasks such as machine translation, sentiment analysis, word analogy, named entity recognition and word similarity. Despite this, the only work that provides word vectors for Hausa language is that of Bojanowski et al. [1] trained using fastText, consisting of only a few words vectors. This work presents words embedding models using Word2Vec's Continuous Bag of Words (CBoW) and Skip Gram (SG) models. The models, hauWE (Hausa Words Embedding), are bigger and better than the only previous model, making them more useful in NLP tasks. To compare the models, they were used to predict the 10 most similar words to 30 randomly selected Hausa words. hauWE CBoW's 88.7% and hauWE SG's 79.3% prediction accuracy greatly outperformed Bojanowski et al. [1]'s 22.3%.

Full Text