Abstract

Changes in protein and gene expression levels are often used as features in predictive modeling such as survival prediction. A common strategy to aggregate information contained in individual proteins is to integrate the expression levels with the biological networks. In this work, we propose a novel patient representation where we integrate proteins’ expression levels with the protein-protein interaction (PPI) networks: Patient representation with PRER (Pairwise Relative Expressions with Random walks). PRER captures the dysregulation patterns of proteins based on the neighborhood of a protein in the PPI network. Specifically, PRER computes a feature vector for a patient by comparing the source protein’s expression level with other proteins’ levels that are within its neighborhood. The neighborhood of the source protein is derived by biased random-walk strategy on the network. We test PRER’s performance in survival prediction task in 10 different cancers using random forest survival models. PRER yields a statistically significant predictive performance in 9 out of 10 cancers when compared to the same model trained with features based on individual protein expressions. Furthermore, we identified the pairs of proteins that their interactions are predictive of patient survival but their individual expression levels are not. The set of identified relations provides a valuable collection of protein biomarkers with high prognostic value. PRER can be used for other complex diseases and prediction tasks that use molecular expression profiles as input. PRER is freely available at: https://github.com/hikuru/PRER.

Highlights

  • With the advances in sequencing technologies, large-scale molecular profiling of patients has become possible

  • To assess if Pairwise Rank Expressions with Random walks (PRER) representation captures the molecular expression profiles better than the individual protein expression values, we use these representations for survival prediction

  • We use the protein expression values as input, which is the typical approach taken in survival prediction

Read more

Summary

Introduction

With the advances in sequencing technologies, large-scale molecular profiling of patients has become possible. The comprehensive profiling of cancer patients, along with their clinical data, presents an opportunity to gain deeper insights into cancer and develop prediction tools for disease outcome. Machine learning has been an instrumental tool in various studies to realize this aim. In these studies, patients are often represented with their molecular profiles, such as protein or gene expressions. Yuan et al [1] assess the utility of different types of molecular data for survival prediction where miRNA, protein, or mRNA expressions were considered. Similar approaches are followed by others for different clinical outcomes [2,3,4]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.