Abstract

In this paper, we present a network-based clustering method, called vector Wasserstein clustering (vWCluster), based on the vector-valued Wasserstein distance derived from optimal mass transport (OMT) theory. This approach allows for the natural integration of multi-layer representations of data in a given network from which one derives clusters via a hierarchical clustering approach. In this study, we applied the methodology to multi-omics data from the two largest breast cancer studies. The resultant clusters showed significantly different survival rates in Kaplan-Meier analysis in both datasets. CIBERSORT scores were compared among the identified clusters. Out of the 22 CIBERSORT immune cell types, 9 were commonly significantly different in both datasets, suggesting the difference of tumor immune microenvironment in the clusters. vWCluster can aggregate multi-omics data represented as a vectorial form in a network with multiple layers, taking into account the concordant effect of heterogeneous data, and further identify subgroups of tumors in terms of mortality.

Highlights

  • Current large-scale cancer genome projects, such as The Cancer Genome Atlas (TCGA), provide a comprehensive molecular portrait of human cancers, including gene expression, copy number variation (CNV), and DNA methylation profiles

  • We developed a vector-valued optimal mass transport (OMT) approach that integrates multi-omics data represented in a multi-layer network, on which we applied the W1 Wasserstein distance (EMD)

  • The vector-valued Wasserstein distance was computed on gene expression and CNV data for METABRIC data

Read more

Summary

Introduction

Current large-scale cancer genome projects, such as The Cancer Genome Atlas (TCGA), provide a comprehensive molecular portrait of human cancers, including gene expression, copy number variation (CNV), and DNA methylation profiles These offer unprecedented opportunities for exploring cancer biology that is characterized through various molecular functions and their complex interactions. We developed a vector-valued OMT approach that integrates multi-omics data represented in a multi-layer network, on which we applied the W1 Wasserstein distance (EMD). The W1 Wasserstein distance (EMD) was first formulated by the French civil engineer and mathematician Gaspard Monge in 1781 [6, 7, 22, 23]. This subject was inspired by the problem of finding the optimal plan, relative to a given cost, for moving a pile of soil from a given location to another in a mass preserving manner. The original Monge’s formulation of OMT (in which the cost function is defined by the distance) may be given a modern expression as follows [6, 7]: Z

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call