The application of clustering has always been an important method for problem-solving. In the era of big data, most classical clustering methods suffer from the curse of dimensionality and scalability issues. Recently, deep clustering models have garnered more attention due to their capabilities in dealing with complex, high-dimensional, and large-scale datasets. They offer intriguing perspectives owing to their outstanding representative capacity and fast inference speed. The remaining major problem in clustering scenarios with high-dimensional data revolves around determining an appropriately compressed representation that semantically preserves cluster structures. Without labels, defining an objective function to encourage a suitable representation becomes a critical question. After several years of stagnation, impressive results have been achieved in the last two years. This paper proposes a comprehensive and up-to-date review of deep clustering methods. We first introduce the basic concepts shared by several deep clustering algorithms, available network architectures, and optimization strategies. Then, a detailed review is presented for each family by analyzing their most representative algorithms. These algorithms are then assessed based on their classification accuracy and from a multi-criteria perspective to aid investigators in selecting the most appropriate solution. Finally, an overview of the diversity of tasks and application domains is provided, and current issues and challenges are discussed.
Read full abstract