Integration of pan-cancer multi-omics data for novel mixed subgroup identification using machine learning methods.

Seema Khadirnaikar,S R M Prasanna,Sudhanshu Shukla,Amy Mccart Reed

doi:10.1371/journal.pone.0287176

Seema Khadirnaikar, S R M Prasanna + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0287176

Copy DOI

Journal: PloS one	Publication Date: Oct 19, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: Indian Institute of Technology Dharwad

Abstract

Cancer is a heterogeneous disease, and patients with tumors from different organs can share similar epigenetic and genetic alterations. Therefore, it is crucial to identify the novel subgroups of patients with similar molecular characteristics. It is possible to propose a better treatment strategy when the heterogeneity of the patient is accounted for during subgroup identification, irrespective of the tissue of origin. This work proposes a machine learning (ML) based pipeline for subgroup identification in pan-cancer. Here, mRNA, miRNA, DNA methylation, and protein expression features from pan-cancer samples were concatenated and non-linearly projected to a lower dimension using an ML algorithm. This data was then clustered to identify multi-omics-based novel subgroups. The clinical characterization of these ML subgroups indicated significant differences in overall survival (OS) and disease-free survival (DFS) (p-value<0.0001). The subgroups formed by the patients from different tumors shared similar molecular alterations in terms of immune microenvironment, mutation profile, and enriched pathways. Further, decision-level and feature-level fused classification models were built to identify the novel subgroups for unseen samples. Additionally, the classification models were used to obtain the class labels for the validation samples, and the molecular characteristics were verified. To summarize, this work identified novel ML subgroups using multi-omics data and showed that the patients with different tumor types could be similar molecularly. We also proposed and validated the classification models for subgroup identification. The proposed classification models can be used to identify the novel multi-omics subgroups, and the molecular characteristics of each subgroup can be used to design appropriate treatment regimen.

Full Text