Abstract

MotivationBiomarker discovery methods are essential to identify a minimal subset of features (e.g., serum markers in predictive medicine) that are relevant to develop prediction models with high accuracy. By now, there exist diverse feature selection methods, which either are embedded, combined, or independent of predictive learning algorithms. Many preceding studies showed the defectiveness of single feature selection results, which cause difficulties for professionals in a variety of fields (e.g., medical practitioners) to analyze and interpret the obtained feature subsets. Whereas each of these methods is highly biased, an ensemble feature selection has the advantage to alleviate and compensate for such biases. Concerning the reliability, validity, and reproducibility of these methods, we examined eight different feature selection methods for binary classification datasets and developed an ensemble feature selection system.ResultsBy using an ensemble of feature selection methods, a quantification of the importance of the features could be obtained. The prediction models that have been trained on the selected features showed improved prediction performance.Electronic supplementary materialThe online version of this article (doi:10.1186/s13040-016-0114-4) contains supplementary material, which is available to authorized users.

Highlights

  • In the fields of predictive medicine as well as molecular diagnostics the need for simplification of datasets with many parameters frequently emerges

  • We evaluated our ensemble feature selection (EFS) method compared to the state-of-the-art method Area Under the Curve (AUC)-feature selection (FS) with regard to the prediction performance in subsequent classification based on six different datasets

  • Selected features The number of selected features from EFS and AUC-FS varies for each dataset

Read more

Summary

Introduction

In the fields of predictive medicine as well as molecular diagnostics the need for simplification of datasets with many parameters frequently emerges. Approaches are necessary, which can identify important parameters (sometimes referred to as features, independent variables, or predictor variables). Such quantifiable parameters that allow diagnostic validity are called biomarkers. In 2001, the Biomarkers Definitions Working Group of the American National Institute of Health defined a biomarker as “a characteristic that is objectively measured and evaluated as an indication of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention“ [1]. Examples for biomarkers are serum parameters, genetic markers, or socio-demographic markers.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call