Abstract

Ensemble gene (feature) selection is a promising new strategy with many benefits including more stable gene lists and improved classification results. The ensemble portion is achieved through multiple runs of feature selection which are then aggregated into a single result. The critical question is how many iterations of feature selection are appropriate. Too few iterations can make classification performance suffer. However, too many iterations will cause issues regarding computational costs. The goal is to choose the correct number of iterations to maximize classification performance without expending too much computational power. Our paper is an in-depth study on the effect of the number of iterations of feature selection on classification performance. Our work employs eleven DNA microarray datasets, on which we apply various ensemble methods, feature selection techniques, classifiers, and feature subset sizes. The results show that using 10 iterations of feature ranking during ensemble feature selection is not sufficient to optimize classification results and that a larger number of iterations is required (20 or 50). However, there is very little distinction between 20 iterations and 50 iterations, as both produce very similar classification results. Our recommendation is to use 20 iterations because while 20 iterations and 50 iterations perform similarly, 20 iterations has a much smaller computation time. To our knowledge there has not been a previous study as expansive as this one on the effects of the number of iterations of feature selection on ensemble feature selection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call