Abstract

The development of high-throughput technology enables measurements of many types of omics data, but the meaningful integration of different data types still is a significant challenge. Another difficult and vital challenge is the discovery of cancer molecular subtypes with relevant clinical differences. Here we propose a novel method, called perturbation network fusion clustering (PNF) for multi-omics data integration and cancer subtyping, which can address these two challenges. We creatively combine the connectivity and similarity of patient pairs, first adding statistical knowledge to the similarity network. Adopting perturbation clustering to get the probability (i.e., connectivity) that any two patients are grouped into a cluster on each type of data, and then calculating the similarity between any two patients using the Gaussian kernel function for each data type. Next using connectivity and similarity matrices we generate multiple stable and strong similarity kernels. Finally, we use the similarity network fusion strategy to fuse similarity kernel from each omics data and spectral clustering to discover cancer subtypes with survival differences. The method is validated on simulated data and six cancer datasets from The Cancer Genome Atlas (TCGA) nearly a thousand patient samples, including gene expression, microRNA, and DNA methylation data. PNF accurately identifies known cancer subtypes and novel subgroups of patients with significantly different survival profiles. The method without any prior knowledge is general enough to replace existing multi-omics data integration and unsupervised clustering methods outside the scope of biomedicine research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call