AN IMPROVED HAUSA WORD STEMMING ALGORITHM

Sirajo Musa,Muhammad Muntasir Yakubu,G N Obunadike

doi:10.33003/fjs-2022-0601-899

Abstract

The explosion of scientific publications in different domains coupled with the introduction and socialization of the internet experienced in the last few decades has made information more available than ever before. Consequently, digital storage capacity has been consistently doubling to reflect this geometric increase in information. In view of this, Information Retrieval (IR), nowadays considered the dominant form of information access has become even more critical. However, the problem of using free text in indexing and retrieval arising from spelling mistake, alternative in spelling, affixes and abbreviations has continued to bedevil the field of IR. To mitigate this problem, Stemming Algorithm was introduced in the 1960s. Stemming is an automated process of stripping all word derivatives of their inflectional affixes in order to obtain stem of the word. Because stemming is language specific, there are stemming algorithms designed specifically for most of the major languages in the world. With a speaker population of about 150 million Hausa language stands in need of a better stemming algorithm. This research is an attempt to improve upon the existing Hausa word stemming algorithm. Affix stripping method of conflation with reference lookup was used. Using Sirsat’s evaluation method, this research achieved 96.9% as Correctly Stemmed Word Factor (CSWF), Index Compression Factor – 74.76%, Words Stemmed Factor (WSF) – 70.44% and Average Word Conflation Factor – 59.47%.

Full Text