Abstract
The aim of our paper is to introduce a new deep clustering model called Multi-head Cross-Attention Contrastive Clustering (Multi-CC), which seeks to enhance the performance of the existing deep clustering model CC. Our approach involves first augmenting the data to form image pairs and then using the same backbone to extract the feature representation of these image pairs. We then undertake contrastive learning, separately in the row space and column space of the feature matrix, to jointly learn the instance and cluster representations. Our approach offers several key improvements over the existing model. Firstly, we use a mixed strategy of strong and weak augmentation to construct image pairs. Secondly, we get rid of the pooling layer of the backbone to prevent loss of information. Finally, we introduce a multi-head cross-attention module to improve the model’s performance. These improvements have allowed us to reduce the model training time by 80%. As a baseline, Multi-CC achieves the best results on CIFAR-10, ImageNet-10, and ImageNet-dogs. It is easily replaceable with CC, making models based on CC achieve better performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.