Explaining multivariate molecular diagnostic tests via Shapley values

Joanna Roder,Heinrich Roder,Robert Georgantas,Laura Maguire

doi:10.1186/s12911-021-01569-9

Abstract

BackgroundMachine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests. However, the way in which the resulting scores or classifications are produced from the input data may not be transparent. Algorithmic explainability or interpretability has become a focus of ML research. Shapley values, first introduced in game theory, can provide explanations of the result generated from a specific set of input data by a complex ML algorithm.MethodsFor a multivariate molecular diagnostic test in clinical use (the VeriStrat® test), we calculate and discuss the interpretation of exact Shapley values. We also employ some standard approximation techniques for Shapley value computation (local interpretable model-agnostic explanation (LIME) and Shapley Additive Explanations (SHAP) based methods) and compare the results with exact Shapley values.ResultsExact Shapley values calculated for data collected from a cohort of 256 patients showed that the relative importance of attributes for test classification varied by sample. While all eight features used in the VeriStrat® test contributed equally to classification for some samples, other samples showed more complex patterns of attribute importance for classification generation. Exact Shapley values and Shapley-based interaction metrics were able to provide interpretable classification explanations at the sample or patient level, while patient subgroups could be defined by comparing Shapley value profiles between patients. LIME and SHAP approximation approaches, even those seeking to include correlations between attributes, produced results that were quantitatively and, in some cases qualitatively, different from the exact Shapley values.ConclusionsShapley values can be used to determine the relative importance of input attributes to the result generated by a multivariate molecular diagnostic test for an individual sample or patient. Patient subgroups defined by Shapley value profiles may motivate translational research. However, correlations inherent in molecular data and the typically small ML training sets available for molecular diagnostic test development may cause some approximation methods to produce approximate Shapley values that differ both qualitatively and quantitatively from exact Shapley values. Hence, caution is advised when using approximate methods to evaluate Shapley explanations of the results of molecular diagnostic tests.

Highlights

Machine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests
We investigated three expressions proposed to characterize the importance of pairs of features, or interactions, for classification: Shapley interaction indices [14], A main effects term for i = j [16] can be defined as STIIii f = [f ({i}) − f ({∅}) ]
Exact Shapley interaction indices and Harsanyi dividends To assess the importance of pairs of features to the classification from the VS algorithm for each instance, we evaluated three previously proposed quantities: SIIs [14], ShapleyTaylor interaction indices (STII) [16], and HDs [7]. (Note that while SIIs and STIIs evaluate the contribution of features i and j in the context of coalitions of other features, HDs only consider features i and j in isolation.) The results are shown in the heatmap of Fig. 5 for all pairs of distinct features, i, j i = j for six instances: a uniform Good, a non-uniform Good and a boundary Good instance and corresponding examples of Poor instances

Summary

Introduction

Machine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests. First introduced in game theory, can provide explanations of the result generated from a specific set of input data by a complex ML algorithm. Methods: For a multivariate molecular diagnostic test in clinical use (the VeriStrat® test), we calculate and discuss the interpretation of exact Shapley values. Roder et al BMC Med Inform Decis Mak (2021) 21:211 diagnostic tests produced from large numbers of attributes via ML can be effective predictors of outcome, making use of the information in these highly multivariate data inputs to improve performance and robustness. Neither the way in which the tests produce a result for a given patient, nor the biological rationale underlying the tests may be transparent. Concerns about biases in ML implementations, including those containing the attributes gender or race [2, 3], and the recognition of the right of individuals to understand how their personal data is being used, have highlighted the need for interpretable explanations and quantification of how attributes are used by complex ML algorithms [4]

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Jul 8, 2021
Citations: 18	License type: open-access

R Discovery Prime

Explaining multivariate molecular diagnostic tests via Shapley values

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

Relation between fault characteristic frequencies and local interpretability shapley additive explanations for continuous machine health monitoring
Tongtong Yan ... Dong Wang
Engineering Applications of Artificial Intelligence | VOL. 136
Tongtong Yan, et. al.Tongtong Yan ... Dong Wang
26 Jul 2024
Engineering Applications of Artificial Intelligence | VOL. 136

Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel
Andrea Mastropietro ... Jürgen Bajorath
Scientific Reports | VOL. 13
Andrea Mastropietro, et. al.Andrea Mastropietro ... Jürgen Bajorath
10 Nov 2023
Scientific Reports | VOL. 13

Ensembles of Random SHAPs
Lev Utkin ... Andrei Konstantinov
Algorithms | VOL. 15
Lev Utkin, et. al.Lev Utkin ... Andrei Konstantinov
17 Nov 2022
Algorithms | VOL. 15

831 Exact Shapley values for explaining complex machine learning based molecular tests of checkpoint inhibitors: potential utility for patients, physicians, and translational research
Heinrich Roder ... Thomas Campbell
Journal for ImmunoTherapy of Cancer | VOL. 9
Heinrich Roder, et. al.Heinrich Roder ... Thomas Campbell
01 Nov 2021
Journal for ImmunoTherapy of Cancer | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Explaining multivariate molecular diagnostic tests via Shapley values

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making