Abstract

In the big data era, the data are generated from different sources or observed from different views. These data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very important in big data mining and analysis. This calls for advanced techniques that consider the diversity of different views, while fusing these data. Multi-view Clustering (MvC) has attracted increasing attention in recent years by aiming to exploit complementary and consensus information across multiple views. This paper summarizes a large number of multi-view clustering algorithms, provides a taxonomy according to the mechanisms and principles involved, and classifies these algorithms into five categories, namely, co-training style algorithms, multi-kernel learning, multiview graph clustering, multi-view subspace clustering, and multi-task multi-view clustering. Therein, multi-view graph clustering is further categorized as graph-based, network-based, and spectral-based methods. Multi-view subspace clustering is further divided into subspace learning-based, and non-negative matrix factorization-based methods. This paper does not only introduce the mechanisms for each category of methods, but also gives a few examples for how these techniques are used. In addition, it lists some publically available multi-view datasets. Overall, this paper serves as an introductory text and survey for multi-view clustering.

Highlights

  • In many real-world applications of big data mining and analysis, data are collected from different sources in diverse domains or obtained from various feature collectors

  • We provide a brief summary, including multi-modal clustering based on Markov random field[175], multi-view clustering ensembles based on multi-view spectral clustering and multi-view kernel K-means clustering with ensemble technology[176], bi-level weighted Multi-view Clustering (MvC) based on Kmeans[177–180], and multi-view fuzzy clustering[181–184]

  • This paper surveyed most of the existing algorithms and technologies of MvC, and classified these MvC algorithms into five categories, i.e., co-training style algorithm, multi-kernel learning, multi-view graph clustering, multi-view subspace clustering, and multi-task multi-view clustering

Read more

Summary

Introduction

In many real-world applications of big data mining and analysis, data are collected from different sources in diverse domains or obtained from various feature collectors. Big Data Mining and Analytics, June 2018, 1(2): 83-107 organize and summarize them in five categories: Co-training style algorithms: This category of methods treats multi-view data by using cotraining strategy. It bootstraps the clustering of different views by using the prior or learning knowledge from one another. Multi-kernel learning: This category of methods uses predefined kernels corresponding to different views, and combines these kernels either linearly or non-linearly in order to improve clustering performance. Multi-view graph clustering: This category of methods seeks to find a fusion graph (or network) across all views and uses graph-cut algorithms or other technologies (e.g., spectral clustering) on the fusion graph in order to produce the clustering result.

Principles of MvC
Co-training style algorithms
Multi-kernel learning
Multi-view graph clustering
Network-based MvC
Spectral-based MvC
Multi-view subspace clustering
Subspace learning-based MvC
NMF-based MvC
Multi-task multi-view clustering
H3 t D1 i D1
Publically Available Datasets
Conclusion and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call