Deep clustering outperforms conventional clustering by mutually promoting representation learning and cluster assignment. However, most existing deep clustering methods suffer from two major drawbacks. Firstly, most cluster assignment methods are highly dependent on the intermediate target distribution generated by a handcrafted nonlinear mapping function. Secondly, the clustering results can be easily guided towards wrong direction by the misassigned samples in each cluster. The existing deep clustering methods are incapable of discriminating such samples. These facts largely limit the possible performance that deep clustering methods can reach. To address these issues, a novel Self-Supervised Clustering (SSC) framework is constructed, which boosts the clustering performance by classification in an unsupervised manner. Fuzzy theory is used to score the membership of each sample to the clusters in terms of probability in each training epoch, which evaluates the intermediate clustering result certainty of each sample. The most reliable samples can be selected with the help of a sample selection method according to the membership and enhanced by data augmentation method. These augmented data are employed to fine-tune an off-the-shelf deep network classifier with the labels provided by the clustering in a self-supervised way. The classification results of the original dataset are used as the target distribution to guide the training process of the deep clustering model. The proposed framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of the powerful classifier. Extensive experiments indicate that the proposed framework remarkably outperforms state-of-the-art deep clustering methods on four benchmark datasets.
Read full abstract