Discovering Topic Representative Terms for Short Text Clustering

Shuiqiao Yang,Guangyan Huang,Borui Cai

doi:10.1109/access.2019.2927345

Abstract

Clustering short texts are one of the most important text analysis methods to help extract knowledge from online social media platforms, such as Twitter, Facebook, and Weibo. However, the instant features (such as abbreviation and informal expression) and the limited length of short texts challenge the clustering task. Fortunately, short texts about the same topic often share some common terms (or term stems), which can effectively represent a topic (i.e., supported by a cluster of short texts), and we also call them topic representative terms. Taking advantage of topic representative terms, it is much easier to cluster short texts by grouping short texts into the most similar topic representative term groups. This paper provides a novel topic representative term discovery (TRTD) method for short text clustering. In our TRTD method, we discover groups of closely bound up topic representative terms by exploiting the closeness and significance of terms. The closeness of the topic representative terms is measured by their interdependent co-occurrence, and the significance is measured by their global term occurrences throughout the whole short text corpus. The experimental results on real-world datasets demonstrate that TRTD achieves better accuracy and efficiency in short text clustering than the state-of-the-art methods.

Highlights

Short text documents are increasingly available with the advancement of online social media platforms, such as Twitter, Facebook and Weibo, etc
Inspired by the previous studies [5], [10], [18], which use words relation network to address the difficulties in short text clustering, in this paper, we propose a novel topic representative terms discovery (TRTD) method to find those significant terms that are closely bound up with each other as a group of topic representative terms for short text clustering
SHORT TEXT CLUSTERING ACCURACY ANALYSIS In this subsection, we study the clustering accuracy of topic representative term discovery (TRTD) and the counterpart methods

Summary

Introduction

Short text documents are increasingly available with the advancement of online social media platforms, such as Twitter, Facebook and Weibo, etc. Clustering short text documents is one of the most significant text analysis methods to help extract knowledge from the abundant text data on the internet, such as news titles and tweets. According to many researchers [4]–[6], short text clustering is more challenging than the regular text clustering. It is due to the instant features (e.g., abbreviation and informal expression) and shortness of the text that brings sparsity, noise and high dimensionalities in the process of text analytics. Short texts contain lots of noise and provide limited contextual clues for applying traditional data mining techniques. Many adapted approaches were proposed for short text clustering in recent years

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Discovering Topic Representative Terms for Short Text Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Transferring topical knowledge from auxiliary long texts for short text clustering
Ou Jin ... Qiang Yang
-
Ou Jin, et. al.Ou Jin ... Qiang Yang
24 Oct 2011
24 Oct 2011

Deep Structured Clustering of Short Text
Junxian Wu ... Yongqi Li
-
Junxian Wu, et. al.Junxian Wu ... Yongqi Li
01 Jan 2021
01 Jan 2021

Concept decompositions for short text clustering by identifying word communities
Caiyan Jia ... Jian Yu
Pattern Recognition | VOL. 76
Caiyan Jia, et. al.Caiyan Jia ... Jian Yu
10 Oct 2017
Pattern Recognition | VOL. 76

Effects on Time and Quality of Short Text Clustering during Real-Time Presentations
Diego Fuentealba ... Héctor Ponce
IEEE Latin America Transactions | VOL. 19
Diego Fuentealba, et. al.Diego Fuentealba ... Héctor Ponce
01 Aug 2021
IEEE Latin America Transactions | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discovering Topic Representative Terms for Short Text Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access