Abstract

Clustering is a fundamental machine learning task, which aim at assigning instances into groups so that similar samples belong to the same cluster while dissimilar samples belong to different clusters. Shallow clustering methods usually assume that data are collected and expressed as feature vectors within which clustering is performed. However, clustering high-dimensional data, such as images, texts, videos, and graphs, poses significant challenges for clustering tasks, such as indiscriminate representation and intricate relationships among instances. Over the past decades, deep learning has achieved remarkable success in effective representation learning and modeling complex relationships. Motivated by these advancements, Deep Clustering seeks to improve clustering outcomes through deep learning techniques, garnering considerable interest from both academia and industry. Despite many contributions to this vibrant area of research, the lack of systematic analysis and a comprehensive taxonomy has hindered progress in this field. In this survey, we first explore how deep learning can be integrated into deep clustering and identify two fundamental components: the representation learning module and the clustering module. Then, we summarize and analyze the representative design of these two modules. Furthermore, we introduce a novel taxonomy of deep clustering based on how these two modules interact, specifically through multistage, generative, iterative, and simultaneous approaches. In addition, we present well-known benchmark datasets, evaluation metrics, and open-source tools to clearly demonstrate different experimental approaches. Finally, we examine the practical applications of deep clustering and propose challenging areas for future research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.