Multi-view clustering for multi-omics data using unified embedding

Sayantan Mitra,Mohammed Hasanuzzaman,Sriparna Saha

doi:10.1038/s41598-020-70229-1

Sayantan Mitra, Mohammed Hasanuzzaman + Show 1 more

Open Access

https://doi.org/10.1038/s41598-020-70229-1

Copy DOI

Abstract

In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other. Embedding learning is an effective strategy for nearest neighbour search and dimensionality reduction in large data sets. This paper attempts to learn a unified probability distribution of the points across different views and generates a unified embedding in a low-dimensional space to optimally preserve neighbourhood identity. Probability distributions generated for each point for each view are combined by conflation method to create a single unified distribution. The goal is to approximate this unified distribution as much as possible when a similar operation is performed on the embedded space. As a cost function, the sum of Kullback-Leibler divergence over the samples is used, which leads to a simple gradient adjusting the position of the samples in the embedded space. The proposed methodology can generate embedding from both complete and incomplete multi-view data sets. Finally, a multi-objective clustering technique (AMOSA) is applied to group the samples in the embedded space. The proposed methodology, Multi-view Neighbourhood Embedding (MvNE), shows an improvement of approximately 2−3% over state-of-the-art models when evaluated on 10 omics data sets.

Highlights

In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other
Multi-view learning has started with Canonical correlation analysis (CCA)[3] and a series of works on cotraining methods[4,5,6,7]
According to the mechanisms and principles, multiview clustering methods can be broadly divided into four typical classes; (i) subspace-based: these models learn a unified feature representation from all the views.[10,11,12,13,14,15,16]; (ii) late fusion based: model under this category combines the clustering results from multiple views to obtain the final clustering[16,17,18]; (iii) co-training based: methods under this category treats multi-view data by using co-training strategy; (iv) spectral based: under this category, methods learn an optimal similarity matrix to capture the structure of the clusters, which serves as an affinity matrix for spectral clustering[19,20,21]

Summary

Introduction

Data sets are often comprised of multiple views, which provide consensus and complementary information to each other. According to the mechanisms and principles, multiview clustering methods can be broadly divided into four typical classes; (i) subspace-based: these models learn a unified feature representation from all the views.[10,11,12,13,14,15,16]; (ii) late fusion based: model under this category combines the clustering results from multiple views to obtain the final clustering[16,17,18]; (iii) co-training based: methods under this category treats multi-view data by using co-training strategy; (iv) spectral based: under this category, methods learn an optimal similarity matrix to capture the structure of the clusters, which serves as an affinity matrix for spectral clustering[19,20,21] Amongst these wide variety of multi-view clustering methods, subspace ones perform better and are widely studied. Of Latent Multi-View Subspace Clustering (LMSC) algorithm, linear LMSC (lLMSC) and generalized LMSC (gLMSC). lLMSC uses a linear correlation between each view and the latent representation, and gLMSC uses neural networks to obtain the generalized relationships between the views

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Aug 12, 2020
Citations: 17	License type: open-access

R Discovery Prime

R Discovery Prime

Multi-view clustering for multi-omics data using unified embedding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Nonlinear Dimensionality Reduction With Missing Data Using Parametric Multiple Imputations.
Cyril De Bodt ... Dounia Mulders
IEEE transactions on neural networks | VOL. 30
Cyril De Bodt, et. al.Cyril De Bodt ... Dounia Mulders
01 Apr 2019
IEEE transactions on neural networks | VOL. 30

A clustering method for very large mixed data sets
G Sanchez-Diaz ... J Ruiz-Shulcloper
-
G Sanchez-Diaz, et. al.G Sanchez-Diaz ... J Ruiz-Shulcloper
29 Nov 2001
29 Nov 2001

Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies
G Lee ... C Rodriguez
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 5
G Lee, et. al.G Lee ... C Rodriguez
01 Jul 2008
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 5

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & cellular proteomics : MCP | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & cellular proteomics : MCP | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-view clustering for multi-omics data using unified embedding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports