Abstract

Fine-tuning with pre-trained language models (e.g. BERT) has achieved great success in many language understanding tasks in supervised settings (e.g. text classification). However, relatively little work has been focused on applying pre-trained models in unsupervised settings, such as text clustering. In this paper, we propose a novel method to fine-tune pre-trained models unsupervisedly for text clustering, which simultaneously learns text representations and cluster assignments using a clustering oriented loss. Experiments on three text clustering datasets (namely TREC-6, Yelp, and DBpedia) show that our model outperforms the baseline methods and achieves state-of-the-art results.

Highlights

  • Pre-trained language models have shown remarkable progress in many natural language understanding tasks (Radford et al, 2018; Peters et al, 2018; Howard and Ruder, 2018)

  • BERT has achieved great success in many natural language understanding tasks under supervised fine-tuning approaches, relatively little work has been focused on applying pre-trained models in unsupervised settings

  • By a case study of text clustering, we investigate how to leverage the pre-trained BERT model and fine-tune it in unsupervised settings, such as text clustering

Read more

Summary

Introduction

Pre-trained language models have shown remarkable progress in many natural language understanding tasks (Radford et al, 2018; Peters et al, 2018; Howard and Ruder, 2018). BERT (Devlin et al, 2018) applies the fine-tuning approach to achieve ground-breaking performance in a set of NLP tasks. BERT has achieved great success in many natural language understanding tasks under supervised fine-tuning approaches, relatively little work has been focused on applying pre-trained models in unsupervised settings. Two-stage approach uses deep learning frameworks to learn the representation first and run clustering algorithms (Chen, 2015; Yang et al, 2017). Jointly optimization approaches learn the representations and clustering jointly (Xie et al, 2016; Guo et al, 2017). Inspired by those methods, we can fine-tune pre-trained models by learning text representations and cluster assignments simultaneously

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.