Abstract

Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. While opaqueness concerning machine behavior might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway-Induced Multiple Kernel Learning (PIMKL), a methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels to predict a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.

Highlights

  • Designing reliable and interpretable predictive models for patient stratification and biomarker discovery is a daunting challenge in computational biology

  • We discuss the application of Pathway-Induced Multiple Kernel Learning (PIMKL) to different breast cancer cohorts

  • We have presented here PIMKL (Pathway-Induced Multiple Kernel Learning), a novel, effective and interpretable machine learning methodology for phenotype prediction using multi-modal molecular data

Read more

Summary

Introduction

Designing reliable and interpretable predictive models for patient stratification and biomarker discovery is a daunting challenge in computational biology. The methods included among others: average pathway expression,[14] classification by significant hub genes,[15] pathway activity classification,[16] and a series of approaches based on Support Vector Machines (SVMs), such as network-based SVMs,[17] recursive feature elimination SVMs,[18] and graph diffusion kernels for SVMs.[19,20] The study concluded that, while none of the evaluated approaches significantly improved classification accuracy, the interpretability of the gene signatures obtained was greatly enhanced by the integration of prior knowledge

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call