SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)

Chen Li,Jia Wang,Bo Zou,Shuangyong Song,Xiaoguang Yu,Xiaodong He

doi:10.1609/aaai.v36i11.21635

SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)

Chen Li, Jia Wang + Show 4 more

Open Access

https://doi.org/10.1609/aaai.v36i11.21635

Copy DOI

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 2

Affiliation: Sichuan University, JDSU (United States)

#Appropriate Number Of Clusters #Pre-trained BERT Model + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This paper presents SimCTC, a simple contrastive learning (CL) framework that greatly advances the state-of-the-art text clustering models. In SimCTC, a pre-trained BERT model first maps the input sequence to the representation space, which is then followed by three different loss function heads: Clustering head, Instance-CL head and Cluster-CL head. Experimental results on multiple benchmark datasets demonstrate that SimCTC remarkably outperforms 6 competitive text clustering methods with 1%-6% improvement on Accuracy (ACC) and 1%-4% improvement on Normalized Mutual Information (NMI). Moreover, our results also show that the clustering performance can be further improved by setting an appropriate number of clusters in the cluster-level objective.

Full Text