Abstract

The high-dimensional data are often characterized by more number of features with less number of instances. Many of the features are irrelevant and redundant. These features may be especially harmful in case of extreme number of features carries the problem of memory usage in order to represent the datasets. On the other hand relatively small training set, where this irrelevancy and redundancy makes harder to evaluate. Hence, in this paper we propose an efficient feature selection and classification method based on Particle Swarm Optimization (PSO) and rough sets. In this study, we propose the inconsistency handler algorithm for handling inconsistency in dataset, new quick reduct algorithm for handling irrelevant/noisy features and fitness function with three parameters, the classification quality of feature subset, remaining features and the accuracy of approximation. The proposed method is compared with two traditional and three fusion of PSO and rough set-based feature selection methods. In this study, Decision Tree and Naive Bayes classifiers are used to calculate the classification accuracy of the selected feature subset on nine benchmark datasets. The result shows that the proposed method can automatically selects small feature subset with better classification accuracy than using all features. The proposed method also outperforms the two traditional and three existing PSO and rough set-based feature selection methods in terms of the classification accuracy, cardinality of feature and stability indices. It is also observed that with increased weight on the classification quality of feature subset of the fitness function, there is a significant reduction in the cardinality of features and also achieve better classification accuracy as well.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call