Khoja Stemmer Research Articles

With the advent of AI text-based tools and applications, the need to introduce and investigate word-processing tools has also been raised. NLP tools and techniques have developed rapidly for some languages, such as English. However, other languages, such as Arabic, still need to introduce more methods and techniques to provide more explanations. In this study, we present a sample to classify customer reviews which are written in Arabic. The data set (HARD) is used to be certified as a dataset for work. This study adopted four classifications in machine learning and deep learning (CNN, RNN, NB, LR). In addition, the texts were cleaned using data cleaning techniques, and the stemming technique was used, and three types of them were implemented (Khoja Stemmer, Snowball Stemmer, Thashaphyne Stemmer). Moreover, two methods of feature extraction were used (TF-IDF, N-gram). The results of the model provided several explanations. The best performance resulted from the use of (CNN+ Snowball Stemmer +N-gram) with accuracy (%93.5). The results of the model stated that some workbooks are sensitive to the use of different tools, and some accuracy performance can also be affected if there are different methods for extracting the features used. Either feature extraction has an impact on accuracy performance. The model also proved that colloquial Arabic could cause some limitations because different dialects can give different meanings across different regions or countries. The results of the study open the door to exploring other tools and methods to enrich natural Arabic language processing and contribute to the development of new applications that support Arabic content.

Read full abstract

At present many experts in the field of information technology have designed and developed algorithms to solve stemming problems, especially in Arabic. But of the many stemming analyses in Arabic, there is no standardization of a good stemming algorithm in analysing the accuracy of the text in the Koran. The development of stemming in the Koran is significant to work because it supports the Sharaf classification in the Koran to understand the meaning of every word in the Qur’an. One stemmer or stemming an algorithm to find the primary form of an Arabic word is the Khoja Stemmer algorithm. The way of working from Khoja Stemmer is to try to find the root of an Arabic word by removing the longest prefix and the longest suffix of a word, then try to determine the root of the remaining words using the root word dictionary. In this study, the Khoja Stemmer built was able to calculate the average stemming of the Koran by 95.295%. But the root words produced by Khoja Stemmer if manually checked, there are still several errors. Thus, an Al-Quran dictionary is needed to analyse each stemming result conducted by Khoja stemmer in stemming the Koran.

Read full abstract

Khoja Stemmer Research Articles

Related Topics

Articles published on Khoja Stemmer

Evaluating The Impact of Feature Extraction Techniques on Arabic Reviews Classification

Analysis and implementation of computer-based system development of stemming algorithm for finding Arabic root word

A novel root based Arabic stemmer

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Khoja Stemmer Research Articles

Related Topics

Articles published on Khoja Stemmer

Evaluating The Impact of Feature Extraction Techniques on Arabic Reviews Classification

Analysis and implementation of computer-based system development of stemming algorithm for finding Arabic root word

A novel root based Arabic stemmer