Abstract

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.