Inferring feature importance with uncertainties with application to large genotype data

Pål Vegard Johnsen,Signe Riemer-Sørensen,Mette Langaas,Andrew Thomas Dewan,Inga Strümke

doi:10.1371/journal.pcbi.1010963

Abstract

Estimating feature importance, which is the contribution of a prediction or several predictions due to a feature, is an essential aspect of explaining data-based models. Besides explaining the model itself, an equally relevant question is which features are important in the underlying data generating process. We present a Shapley-value-based framework for inferring the importance of individual features, including uncertainty in the estimator. We build upon the recently published model-agnostic feature importance score of SAGE (Shapley additive global importance) and introduce Sub-SAGE. For tree-based models, it has the advantage that it can be estimated without computationally expensive resampling. We argue that for all model types the uncertainties in our Sub-SAGE estimator can be estimated using bootstrapping and demonstrate the approach for tree ensemble methods. The framework is exemplified on synthetic data as well as large genotype data for predicting feature importance with respect to obesity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Computational Biology	Publication Date: Mar 14, 2023
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Inferring feature importance with uncertainties with application to large genotype data

Abstract

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing.
Debbie Rankin ... Maurice Mulvenna
JMIR Medical Informatics | VOL. 8
Debbie Rankin, et. al.Debbie Rankin ... Maurice Mulvenna
20 Jul 2020
JMIR Medical Informatics | VOL. 8

Overcoming Data Scarcity in Speaker Identification: Dataset Augmentation with Synthetic MFCCs via Character-level RNN
Jordan J. Bird ... Diego R. Faria
-
Jordan J. Bird, et. al.Jordan J. Bird ... Diego R. Faria
01 Apr 2020
01 Apr 2020

Cascaded Collaborative Regression for Robust Facial Landmark Detection Trained Using a Mixture of Synthetic and Real Images With Dynamic Weighting.
Zhen-Hua Feng ... Josef Kittler
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 24
Zhen-Hua Feng, et. al. Zhen-Hua Feng ... Josef Kittler
17 Jun 2015
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 24

Proof-of-Concept Techniques for Generating Synthetic Thermal Facial Data for Training of Deep Learning Models
Muhammad Ali Farooq ... Peter Corcoran
-
Muhammad Ali Farooq, et. al.Muhammad Ali Farooq ... Peter Corcoran
10 Jan 2021
10 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inferring feature importance with uncertainties with application to large genotype data

Abstract

Talk to us

Similar Papers

More From: PLOS Computational Biology