Abstract

Patient stratification methods are key to the vision of precision medicine. Here, we consider transcriptional data to segment the patient population into subsets relevant to a given phenotype. Whereas most existing patient stratification methods focus either on predictive performance or interpretable features, we developed a method striking a balance between these two important goals. We introduce a Bayesian method called SUBSTRA that uses regularized biclustering to identify patient subtypes and interpretable subtype-specific transcript clusters. The method iteratively re-weights feature importance to optimize phenotype prediction performance by producing more phenotype-relevant patient subtypes. We investigate the performance of SUBSTRA in finding relevant features using simulated data and successfully benchmark it against state-of-the-art unsupervised stratification methods and supervised alternatives. Moreover, SUBSTRA achieves predictive performance competitive with the supervised benchmark methods and provides interpretable transcriptional features in diverse biological settings, such as drug response prediction, cancer diagnosis, or kidney transplant rejection. The R code of SUBSTRA is available at https://github.com/sahandk/SUBSTRA. Supplementary data are available at Bioinformatics online.

Highlights

  • One important challenge for precision medicine is to improve patient treatment based on molecular markers while simultaneously ensuring interpretability of the resulting signatures

  • Our method identifies a transcript cluster related to epithelial-mesenchymal transition (EMT) involved in cell-substrate adhesion as key pathways that respond to selumetinib

  • SUBSTRA achieves both good interpretability and accurate phenotype prediction, which is lacking in the state-of-the-art methods for phenotype prediction (Valdes et al, 2016) such as Support Vector Machine (SVM)

Read more

Summary

Introduction

One important challenge for precision medicine is to improve patient treatment based on molecular markers while simultaneously ensuring interpretability of the resulting signatures. A key task is to reliably identify and weight transcriptional features based on their relevance to the target phenotype and use these weights for patient stratification in a predictive setting. Some methods tackle this problem by incorporating patient strata into phenotype prediction. In another work, Ahmad and Fröhlich (2017) incorporated survival data into patient stratification to improve the separability of disease subtypes with regard to their survival curves They introduced a novel Hierarchical Bayesian Graphical Model, termed Survival-based Bayesian Clustering, which combines a Dirichlet Process Gaussian Mixture Model with an Accelerated Failure Time (AFT) model to simultaneously cluster heterogeneous genomic, transcriptomic and timeto-event data. The former objective corresponds to interpretability and the latter to accuracy

Methods
Biclustering
Feature Weighting
Use equation 3 to sample ctj
Use equation 2 to sample cpi
Experiments and Results
Predictive Performance Evaluation
Descriptive Performance Evaluation
Experiments with Synthetic Data
Experiments with Real Data
Method
SUBSTRA Finds Relevant Transcript Clusters
Runtime of SUBSTRA
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call