Image clustering using generated text centroids

Daehyeon Kong,Kyeongbo Kong,Suk-Ju Kang

doi:10.1016/j.image.2024.117128

Abstract

In recent years, deep neural networks pretrained on large-scale datasets have been used to address data deficiency and achieve better performance through prior knowledge. Contrastive language–image pretraining (CLIP), a vision-language model pretrained on an extensive dataset, achieves better performance in image recognition. In this study, we harness the power of multimodality in image clustering tasks, shifting from a single modality to a multimodal framework using the describability property of image encoder of the CLIP model. The importance of this shift lies in the ability of multimodality to provide richer feature representations. By generating text centroids corresponding to image features, we effectively create a common descriptive language for each cluster. It generates text centroids assigned by the image features and improves the clustering performance. The text centroids use the results generated by using the standard clustering algorithm as a pseudo-label and learn a common description of each cluster. Finally, only text centroids were added when the image features on the same space were assigned to the text centroids, but the clustering performance improved significantly compared to the standard clustering algorithm, especially on complex datasets. When the proposed method is applied, the normalized mutual information score rises by 32% on the Stanford40 dataset and 64% on ImageNet-Dog compared to the k-means clustering algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Image clustering using generated text centroids

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication

Lead the way for us

Similar Papers

Kernel based locality – Sensitive discriminative sparse representation for face recognition
Ben-Bright Benuwa ... Ernest K Ansah
Scientific African | VOL. 7
Ben-Bright Benuwa, et. al.Ben-Bright Benuwa ... Ernest K Ansah
28 Nov 2019
Scientific African | VOL. 7

Research on Ultrasonic Image Recognition Based on Optimization Immune Algorithm.
Xueqiang Zeng ... Sufen Chen
Computational and mathematical methods in medicine | VOL. 2021
Xueqiang Zeng, et. al.Xueqiang Zeng ... Sufen Chen
17 May 2021
Computational and mathematical methods in medicine | VOL. 2021

Accelerating Low Bit-Width Deep Convolution Neural Network in MRAM
Zhezhi He ... Shaahin Angizi
-
Zhezhi He, et. al.Zhezhi He ... Shaahin Angizi
01 Jul 2018
01 Jul 2018

Long-Tailed Recognition Using Class-Balanced Experts
Saurabh Sharma ... Bernt Schiele
-
Saurabh Sharma, et. al.Saurabh Sharma ... Bernt Schiele
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image clustering using generated text centroids

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication