Abstract

Algorithmic – based search approach is ineffective at addressing the problem of multi-dimensional feature selection for document categorization. This study proposes the use of meta heuristic based search approach for optimal feature selection. Elephant optimization (EO) and Ant Colony optimization (ACO) algorithms coupled with Naïve Bayes (NB), Support Vector Machin (SVM), and J48 classifiers were used to highlight the optimization capability of meta-heuristic search for multi-dimensional feature selection problem in document categorization. In addition, the performance results for feature selection using the two meta-heuristic based approaches (EO and ACO) were compared with conventional Best First Search (BFS) and Greedy Stepwise (GS) algorithms on news document categorization. The comparative results showed that global optimal feature subsets were attained using adaptive parameters tuning in meta-heuristic based feature selection optimization scheme. In addition, the selected number of feature subsets were minimized dramatically for document classification.

Highlights

  • Document classification has become a main technology that deals with knowledge discovery process in various applications such as business intelligence model, medical intelligence model, social media intelligence model, and so on

  • The best accuracy was achieved in J48 classifier evaluation process, while the least accuracy was provided in Naïve Baye

  • The accuracy peak value varied according to the classifier models, and J48 classifier was the best appropriate classifier when working with ant colony optimization (ACO)-based searching scheme for filter

Read more

Summary

Introduction

Document classification has become a main technology that deals with knowledge discovery process in various applications such as business intelligence model, medical intelligence model, social media intelligence model, and so on. Selection of optimal feature subset from high dimensionality data for accurate classification model is becoming a tough computational research gap. The nature of text and the role of feature selection is described to highlight the research problem. These includes a description of the unstructured multi-dimensional properties of text, challenges associated with the search for global optimal feature in text feature selection process, difference between meta-heuristic based search and conventional search for feature selection process to emphasize the research gap within multi-dimensional feature selection in text document categorization, theories and calculations for swarm intelligence-based Ant Colony, and nature-inspired intelligence-based Elephant search policies. Three classification learning models were used to evaluate the quality of selected feature subset based on meta-heuristic intelligence. The evaluation schemes for measuring the performance of the proposed model are explained.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call