Inferring disease subtypes from clusters in explanation space

Marc-Andre Schulz,Matt Chapman-Rounds,Manisha Verma,Danilo Bzdok,Konstantinos Georgatzis

doi:10.1038/s41598-020-68858-7

Abstract

Identification of disease subtypes and corresponding biomarkers can substantially improve clinical diagnosis and treatment selection. Discovering these subtypes in noisy, high dimensional biomedical data is often impossible for humans and challenging for machines. We introduce a new approach to facilitate the discovery of disease subtypes: Instead of analyzing the original data, we train a diagnostic classifier (healthy vs. diseased) and extract instance-wise explanations for the classifier’s decisions. The distribution of instances in the explanation space of our diagnostic classifier amplifies the different reasons for belonging to the same class–resulting in a representation that is uniquely useful for discovering latent subtypes. We compare our ability to recover subtypes via cluster analysis on model explanations to classical cluster analysis on the original data. In multiple datasets with known ground-truth subclasses, particularly on UK Biobank brain imaging data and transcriptome data from the Cancer Genome Atlas, we show that cluster analysis on model explanations substantially outperforms the classical approach. While we believe clustering in explanation space to be particularly valuable for inferring disease subtypes, the method is more general and applicable to any kind of sub-type identification.

Highlights

Identification of disease subtypes and corresponding biomarkers can substantially improve clinical diagnosis and treatment selection
We propose a novel space that we believe to be useful for identifying latent subtypes: the space of explanations corresponding to a diagnostic classifier
We argue that the explanation space of a diagnostic classifier is an appropriate embedding space for subsequent cluster analyses aimed at the discovery of latent disease subtypes

Summary

Introduction

Identification of disease subtypes and corresponding biomarkers can substantially improve clinical diagnosis and treatment selection Discovering these subtypes in noisy, high dimensional biomedical data is often impossible for humans and challenging for machines. Recent interest in explaining the output of complex machine learning models has been characterized by a wide range of a pproaches[8, 9], most of them focused on providing an instance-wise explanation of a model’s output as either a subset of input features[10, 11], or a weighting of input features[12, 13] The latter, where each input feature is weighted according to its contribution to the underlying model’s output for an instance, can be thought of as specifying a transformation from feature space to an explanation space. In the case of a diagnostic classifier (healthy vs. diseased), the explanation space relates to the investigated disease

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jul 30, 2020
Citations: 19	License type: open-access

R Discovery Prime

R Discovery Prime

Inferring disease subtypes from clusters in explanation space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Machine-learned cluster identification in high-dimensional data
Alfred Ultsch ... Jörn Lötsch
Journal of Biomedical Informatics | VOL. 66
Alfred Ultsch, et. al.Alfred Ultsch ... Jörn Lötsch
28 Dec 2016
Journal of Biomedical Informatics | VOL. 66

The identification of Parkinson's disease subtypes using cluster analysis: A systematic review
Stephanie M Van Rooden ... Willem J Heiser
Movement Disorders | VOL. 25
Stephanie M Van Rooden, et. al.Stephanie M Van Rooden ... Willem J Heiser
09 Jun 2010
Movement Disorders | VOL. 25

Molecular subtypes of high-grade serous ovarian cancer: the holy grail?
Levi Waldron ... Markus Riester
JNCI: Journal of the National Cancer Institute | VOL. 106
Levi Waldron, et. al.Levi Waldron ... Markus Riester
30 Sep 2014
JNCI: Journal of the National Cancer Institute | VOL. 106

A multivariate feature selection framework for high dimensional biomedical data classification
Abeer Alzubaidi ... Georgina Cosma
-
Abeer Alzubaidi, et. al.Abeer Alzubaidi ... Georgina Cosma
01 Aug 2017
01 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inferring disease subtypes from clusters in explanation space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports