Abstract

Due to the large number of documents available in the internet, emails and digital libraries, document classification is becoming a crucial task extremely required. It is commonly achieved after performing feature selection that consists of selecting appropriate features to enhance the classification accuracy. Most of feature selection based text classification methods rely on building a term-frequency inverse-document frequency feature vector which is not usually efficient. In addition, numerous document classification studies are focused on English language. This paper deals with Arabic Text Classification which is not intensively studied due to the complexity of Arabic language. A new firefly algorithm based feature selection method is proposed. This algorithm has been successfully applied in different combinatorial problems. However, it has not been involved in feature selection concept to deal with Arabic Text Classification. To validate this technique, Support Vector Machine classifier is used as well as three evaluation measures including precision, recall and F-measure. Furthermore, experiments on OSAC real dataset along with a comparison with the state-of-the-art methods are performed. The proposed method achieves a precision value equals to 0.994. The results confirm the efficiency of the proposed feature selection method in improving Arabic Text Classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call