Co-clustering is to cluster samples and features simultaneously, which can also reveal the relationship between row clusters and column clusters. Therefore, lots of scientists have drawn much attention to conduct extensive research on it, and co-clustering is widely used in recommendation systems, gene analysis, medical data analysis, natural language processing, image analysis, and social network analysis. In this article, we survey the entire research aspect of co-clustering, especially the latest advances in co-clustering, and discover the current research challenges and future directions. First, due to different views from researchers on the definition of co-clustering, this article summarizes the definition of co-clustering and its extended definitions, as well as related issues, based on the perspectives of various scientists. Second, existing co-clustering techniques are approximately categorized into four classes: information-theory-based, graph-theory-based, matrix-factorization-based, and other theories-based. Third, co-clustering is applied in various aspects such as recommendation systems, medical data analysis, natural language processing, image analysis, and social network analysis. Furthermore, 10 popular co-clustering algorithms are empirically studied on 10 benchmark datasets with 4 metrics—accuracy, purity, block discriminant index, and running time, and their results are objectively reported. Finally, future work is provided to get insights into the research challenges of co-clustering.
Read full abstract