Abstract

Background:Discovering clinically relevant and statistically significant subtypes of cancer is of paramount importance in precision medicine for customizing the practice of medicine to individual needs. For this purpose gene expression profiles of patients is analysed in an unsupervised manner followed by determining the clinical relevance of the subtypes discovered. However, high dimensional nature of omics data leads to ‘curse of dimensionality’ and results in overlapping subtypes with regard to subtype specific survival curves. None of the methods guarantee separability of patients with respect to subtype specific survival curves. Methods:We propose a robust framework for clustering cancer patients into molecular subtypes by leveraging gene expression data. The framework is based on UMAP and an adaptive to noise robust clustering method OTRIMLE. Survival analysis is performed for the discovered subtypes, and the clustering solution that maximizes the Restricted Life Expectancy difference (RLED) between survival curves for each subtype is chosen. Results:The resulting framework is tested on five datasets of cancer obtained from TCGA, and the results are evaluated by comparing with Neighbourhood based multi-omics (NEMO), Similarity Network Fusion (SNF) and Consensus Clustering (CC). Comparisons show that the proposed framework achieved better separation of subtype specific survival curves. For instance, the minimum separation achieved between any two subtypes for GLIO is 94 days for the proposed framework, while it is about 57, 62 and 38 days respectively for NEMO, SNF and CC. Further, over-representation analysis for each subtype is performed to identify significantly over-represented pathways which can be potential treatment targets. Conclusion:The results advocate that our approach is able to discover subtypes that have better separability in terms of survival, and its significance lies in reducing uncertainty in patient’s expected outcome.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call