Abstract

Ensemble feature selection has recently become a topic of interest for researchers, especially in the area of bioinformatics. The benefits of ensemble feature selection include increased feature (gene) subset stability and usefulness as well as comparable (or better) classification performance compared to using a single feature selection method. However, existing work on ensemble feature selection has concentrated on data diversity (using a single feature selection method on multiple datasets or sampled data from a single dataset), neglecting two other potential sources of diversity. We present these two new approaches for gene selection, functional diversity (using multiple feature selection technique on a single dataset) and hybrid (a combination of data and functional diversity). To demonstrate the value of these new approaches, we measure the similarity between the feature subsets created by each of the three approaches across twenty-six datasets and ten feature selection techniques (or an ensemble of these techniques as appropriate). We also compare the classification performance of models built using each of the three ensembles. Our results show that the similarity between the functional diversity and hybrid approaches is much higher than the similarity between either of those and data diversity, with the distinction between data diversity and our new approaches being particularly strong for hard-to-learn datasets. In addition to having the highest similarity, functional and hybrid diversity generally show greater classification performance than data diversity, especially when selecting small feature subsets. These results demonstrate that these new approaches can both provide a different feature subset than the existing approach and that the resulting novel feature subset is potentially of interest to researchers. To our knowledge there has been no study which explores these new approaches to ensemble feature selection within the domain of bioinformatics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call