Abstract

In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other. Embedding learning is an effective strategy for nearest neighbour search and dimensionality reduction in large data sets. This paper attempts to learn a unified probability distribution of the points across different views and generates a unified embedding in a low-dimensional space to optimally preserve neighbourhood identity. Probability distributions generated for each point for each view are combined by conflation method to create a single unified distribution. The goal is to approximate this unified distribution as much as possible when a similar operation is performed on the embedded space. As a cost function, the sum of Kullback-Leibler divergence over the samples is used, which leads to a simple gradient adjusting the position of the samples in the embedded space. The proposed methodology can generate embedding from both complete and incomplete multi-view data sets. Finally, a multi-objective clustering technique (AMOSA) is applied to group the samples in the embedded space. The proposed methodology, Multi-view Neighbourhood Embedding (MvNE), shows an improvement of approximately 2−3% over state-of-the-art models when evaluated on 10 omics data sets.

Highlights

  • In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other

  • Multi-view learning has started with Canonical correlation analysis (CCA)[3] and a series of works on cotraining ­methods[4,5,6,7]

  • According to the mechanisms and principles, multiview clustering methods can be broadly divided into four typical classes; (i) subspace-based: these models learn a unified feature representation from all the views.[10,11,12,13,14,15,16]; (ii) late fusion based: model under this category combines the clustering results from multiple views to obtain the final ­clustering[16,17,18]; (iii) co-training based: methods under this category treats multi-view data by using co-training strategy; (iv) spectral based: under this category, methods learn an optimal similarity matrix to capture the structure of the clusters, which serves as an affinity matrix for spectral ­clustering[19,20,21]

Read more

Summary

Introduction

Data sets are often comprised of multiple views, which provide consensus and complementary information to each other. According to the mechanisms and principles, multiview clustering methods can be broadly divided into four typical classes; (i) subspace-based: these models learn a unified feature representation from all the views.[10,11,12,13,14,15,16]; (ii) late fusion based: model under this category combines the clustering results from multiple views to obtain the final ­clustering[16,17,18]; (iii) co-training based: methods under this category treats multi-view data by using co-training strategy; (iv) spectral based: under this category, methods learn an optimal similarity matrix to capture the structure of the clusters, which serves as an affinity matrix for spectral ­clustering[19,20,21] Amongst these wide variety of multi-view clustering methods, subspace ones perform better and are widely studied. Of Latent Multi-View Subspace Clustering (LMSC) algorithm, linear LMSC (lLMSC) and generalized LMSC (gLMSC). lLMSC uses a linear correlation between each view and the latent representation, and gLMSC uses neural networks to obtain the generalized relationships between the views

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call