Abstract
Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F 1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have