Abstract

Feature selection is one of several factors affecting text classification systems. Feature selection aims to choose a representative subset of all features to reduce the complexity of classification problems. Usually a single method is used for feature selection. For English, several attempts were reported examining the combination of different feature selection methods. To the best of our knowledge no such attempts were reported for Arabic text classification. In this study, we examined the effect of combining five feature selection methods, namely CHI, IG, GSS, NGL and RS, on Arabic text classification accuracy. Two approaches of combination were used, intersection (AND) and union (OR). The NB classification algorithm was used to classify a Saudi Press Agency dataset which comprised 6,300 texts divided evenly into six classes. Three feature representation schemas were used, namely Boolean, TFiDF and LTC. The experiments show slight improvement in classification accuracy for combining two and three feature selection methods. No improvement on classification accuracy was seen when four or all five feature selection methods were combined.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.