An experimental comparison of feature selection methods on two-class biomedical datasets

P Drotár,J Gazda,Z Smékal

doi:10.1016/j.compbiomed.2015.08.010

Abstract

Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is an essential step for knowledge discovery in many areas of biomedical informatics. The increased popularity of feature selection methods and their frequent utilisation raise challenging new questions about the interpretability and stability of feature selection techniques. In this study, we compared the behaviour of ten state-of-the-art filter methods for feature selection in terms of their stability, similarity, and influence on prediction performance. All of the experiments were conducted on eight two-class datasets from biomedical areas. While entropy-based feature selection appears to be the most stable, the feature selection techniques yielding the highest prediction performance are minimum redundance maximum relevance method and feature selection based on Bhattacharyya distance. In general, univariate feature selection techniques perform similarly to or even better than more complex multivariate feature selection techniques with high-dimensional datasets. However, with more complex and smaller datasets multivariate methods slightly outperform univariate techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An experimental comparison of feature selection methods on two-class biomedical datasets

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine

Lead the way for us

Journal: Computers in Biology and Medicine	Publication Date: Aug 24, 2015
Citations: 69

Similar Papers

An Empirical Study on the Stability of Feature Selection for Imbalanced Software Engineering Data
Huanjing Wang ... Taghi M Khoshgoftaar
-
Huanjing Wang, et. al.Huanjing Wang ... Taghi M Khoshgoftaar
01 Dec 2012
01 Dec 2012

Gene selection stability's dependence on dataset difficulty
David J Dittman ... Randall Wald
-
David J Dittman, et. al.David J Dittman ... Randall Wald
01 Aug 2013
01 Aug 2013

Measuring Stability of Feature Selection Techniques on Real-World Software Datasets
Huanjing Wang ... Randall Wald
-
Huanjing Wang, et. al.Huanjing Wang ... Randall Wald
01 Jan 2013
01 Jan 2013

Comparative Analysis on the Stability of Feature Selection Techniques Using Three Frameworks on Biological Datasets
Randall Wald ... Amri Napolitano
-
Randall Wald, et. al.Randall Wald ... Amri Napolitano
01 Dec 2013
01 Dec 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An experimental comparison of feature selection methods on two-class biomedical datasets

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine