Abstract

We present an integrated algorithm for simultaneous feature selection (FS) and designing of diverse classifiers using a steady state multiobjective genetic programming (GP), which minimizes three objectives: 1) false positives (FPs); 2) false negatives (FNs); and 3) the number of leaf nodes in the tree. Our method divides a c -class problem into c binary classification problems. It evolves c sets of genetic programs to create c ensembles. During mutation operation, our method exploits the fitness as well as unfitness of features, which dynamically change with generations with a view to using a set of highly relevant features with low redundancy. The classifiers of i th class determine the net belongingness of an unknown data point to the i th class using a weighted voting scheme, which makes use of the FP and FN mistakes made on the training data. We test our method on eight microarray and 11 text data sets with diverse number of classes (from 2 to 44), large number of features (from 2000 to 49 151), and high feature-to-sample ratio (from 1.03 to 273.1). We compare our method with a bi-objective GP scheme that does not use any FS and rule size reduction strategy. It depicts the effectiveness of the proposed FS and rule size reduction schemes. Furthermore, we compare our method with four classification methods in conjunction with six features selection algorithms and full feature set. Our scheme performs the best for 380 out of 474 combinations of data sets, algorithm and FS method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call