Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms

Rudolf Jagdhuber,Jörg Rahnenführer,Jochen Neuhaus,Arnulf Stenzl,Michel Lang

doi:10.1186/s12859-020-3361-9

Abstract

BackgroundWith modern methods in biotechnology, the search for biomarkers has advanced to a challenging statistical task exploring high dimensional data sets. Feature selection is a widely researched preprocessing step to handle huge numbers of biomarker candidates and has special importance for the analysis of biomedical data. Such data sets often include many input features not related to the diagnostic or therapeutic target variable. A less researched, but also relevant aspect for medical applications are costs of different biomarker candidates. These costs are often financial costs, but can also refer to other aspects, for example the decision between a painful biopsy marker and a simple urine test. In this paper, we propose extensions to two feature selection methods to control the total amount of such costs: greedy forward selection and genetic algorithms. In comprehensive simulation studies of binary classification tasks, we compare the predictive performance, the run-time and the detection rate of relevant features for the new proposed methods and five baseline alternatives to handle budget constraints.ResultsIn simulations with a predefined budget constraint, our proposed methods outperform the baseline alternatives, with just minor differences between them. Only in the scenario without an actual budget constraint, our adapted greedy forward selection approach showed a clear drop in performance compared to the other methods. However, introducing a hyperparameter to adapt the benefit-cost trade-off in this method could overcome this weakness.ConclusionsIn feature cost scenarios, where a total budget has to be met, common feature selection algorithms are often not suitable to identify well performing subsets for a modelling task. Adaptations of these algorithms such as the ones proposed in this paper can help to tackle this problem.

Highlights

With modern methods in biotechnology, the search for biomarkers has advanced to a challenging statistical task exploring high dimensional data sets
Approaches of cost-sensitive learning may be useful for situations, where the goal is a trade-off between predictive performance and costs
P i=1 ci) to be part of a candidate set. Using this initialization and the flexible constraint violation term of (5), we propose the genetic algorithm with fitness adaptation

Summary

Introduction

With modern methods in biotechnology, the search for biomarkers has advanced to a challenging statistical task exploring high dimensional data sets. Feature selection is a widely researched preprocessing step to handle huge numbers of biomarker candidates and has special importance for the analysis of biomedical data. Such data sets often include many input features not related to the diagnostic or therapeutic target variable. Soft margin budgets have been investigated in the context of feature selection under the Jagdhuber et al BMC Bioinformatics (2020) 21:26 name cost-sensitive learning [1, 2] This field covers flexible approaches harmonizing costs of misclassification and costs of features [3]. Approaches of cost-sensitive learning may be useful for situations, where the goal is a trade-off between predictive performance and costs

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 28, 2020
Citations: 20	License type: open-access

R Discovery Prime

R Discovery Prime

Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Forward feature selection: empirical analysis
Firuz Kamalov ... Annapurna Jonnalagadda
Journal of Intelligent Systems and Internet of Things | VOL. 11
Firuz Kamalov, et. al.Firuz Kamalov ... Annapurna Jonnalagadda
01 Jan 2024
Journal of Intelligent Systems and Internet of Things | VOL. 11

Assessment of metal ion concentration in water with structured feature selection
Pekka Naula ... Tapio Pahikkala
Chemosphere | VOL. 185
Pekka Naula, et. al.Pekka Naula ... Tapio Pahikkala
17 Jul 2017
Chemosphere | VOL. 185

Neighborhood rough set based multi‐label feature selection with label correlation
Yilin Wu ... Yaojin Lin
Concurrency and Computation: Practice and Experience | VOL. 34
Yilin Wu, et. al.Yilin Wu ... Yaojin Lin
01 Aug 2022
Concurrency and Computation: Practice and Experience | VOL. 34

Rainfall prediction using hybrid neural network approach
Sankhadeep Chatterjee ... Narayan C Debnath
-
Sankhadeep Chatterjee, et. al.Sankhadeep Chatterjee ... Narayan C Debnath
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics