LLpowershap: logistic loss-based automated Shapley values feature selection method

Iqbal Madakkatel,Elina Hyppönen

doi:10.1186/s12874-024-02370-8

Abstract

BackgroundShapley values have been used extensively in machine learning, not only to explain black box machine learning models, but among other tasks, also to conduct model debugging, sensitivity and fairness analyses and to select important features for robust modelling and for further follow-up analyses. Shapley values satisfy certain axioms that promote fairness in distributing contributions of features toward prediction or reducing error, after accounting for non-linear relationships and interactions when complex machine learning models are employed. Recently, feature selection methods using predictive Shapley values and p-values have been introduced, including powershap.MethodsWe present a novel feature selection method, LLpowershap, that takes forward these recent advances by employing loss-based Shapley values to identify informative features with minimal noise among the selected sets of features. We also enhance the calculation of p-values and power to identify informative features and to estimate number of iterations of model development and testing.ResultsOur simulation results show that LLpowershap not only identifies higher number of informative features but outputs fewer noise features compared to other state-of-the-art feature selection methods. Benchmarking results on four real-world datasets demonstrate higher or comparable predictive performance of LLpowershap compared to other Shapley based wrapper methods, or filter methods. LLpowershap is also ranked the best in mean ranking among the seven feature selection methods tested on the benchmark datasets.ConclusionOur results demonstrate that LLpowershap is a viable wrapper feature selection method that can be used for feature selection in large biomedical datasets and other settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LLpowershap: logistic loss-based automated Shapley values feature selection method

Abstract

Talk to us

Similar Papers

More From: BMC Medical Research Methodology

Lead the way for us

Similar Papers

Increasing trust in complex machine learning systems
Jaehun Kim
ACM SIGIR Forum | VOL. 55
Jaehun KimJaehun Kim
01 Jun 2021
ACM SIGIR Forum | VOL. 55

Rock Type Classification Models Interpretability Using Shapley Values
Anton Georgievich Voskresenskiy ... Maria Alexandrovna Kuntsevich
-
Anton Georgievich Voskresenskiy, et. al.Anton Georgievich Voskresenskiy ... Maria Alexandrovna Kuntsevich
09 Dec 2021
09 Dec 2021

Explainable Machine Learning for Credit Risk Management When Features are Dependent
Thanh Thuy Do ... Paolo Pagnottoni
Measurement: Interdisciplinary Research and Perspectives | VOL. 22
Thanh Thuy Do, et. al.Thanh Thuy Do ... Paolo Pagnottoni
16 Oct 2023
Measurement: Interdisciplinary Research and Perspectives | VOL. 22

Landslide susceptibility assessment using feature selection-based machine learning models
...
Geomechanics and Engineering | VOL. 25
, et. al. ...
01 Jan 2020
Geomechanics and Engineering | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LLpowershap: logistic loss-based automated Shapley values feature selection method

Abstract

Talk to us

Similar Papers

More From: BMC Medical Research Methodology