Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

Supavit Kongwudhikunakorn ,Kitsana Waiyamai

doi:10.3745/jips.04.0164

Abstract

This paper presents a method for clustering short text documents, such as news headlines, social media statuses, or instant messages. Due to the characteristics of these documents, which are usually short and sparse, an appropriate technique is required to discover hidden knowledge. The objective of this paper is to identify the combination of document representation, document distance, and document clustering that yields the best clustering quality. Document representations are expanded by external knowledge sources represented by a Distributed Representation. To cluster documents, a K-means partitioning-based clustering technique is applied, where the similarities of documents are measured by word mover’s distance. To validate the effectiveness of the proposed method, experiments were conducted to compare the clustering quality against several leading methods. The proposed method produced clusters of documents that resulted in higher precision, recall, F1- score, and adjusted Rand index for both real-world and standard data sets. Furthermore, manual inspection of the clustering results was conducted to observe the efficacy of the proposed method. The topics of each document cluster are undoubtedly reflected by members in the cluster.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

Abstract

Talk to us

Similar Papers

More From: Journal of Information Processing Systems

Lead the way for us

Journal: Journal of Information Processing Systems	Publication Date: Apr 1, 2020
Citations: 1

Similar Papers

Short Text Document Clustering using Distributed Word Representation and Document Distance
Supavit Kongwudhikunakorn ... Kitsana Waiyamai
Walailak Journal of Science and Technology (WJST) | VOL. 16
Supavit Kongwudhikunakorn, et. al.Supavit Kongwudhikunakorn ... Kitsana Waiyamai
26 Mar 2018
Walailak Journal of Science and Technology (WJST) | VOL. 16

A Hybrid Salp Swarm Algorithm with $$\beta $$-Hill Climbing Algorithm for Text Documents Clustering
Ammar Kamal Abasi ... Mohammed Azmi Al-Betar
-
Ammar Kamal Abasi, et. al.Ammar Kamal Abasi ... Mohammed Azmi Al-Betar
01 Jan 2020
01 Jan 2020

Document representation and clustering models for bilingual documents clustering
Shutian Ma ... Chengzhi Zhang
Proceedings of the Association for Information Science and Technology | VOL. 54
Shutian Ma, et. al.Shutian Ma ... Chengzhi Zhang
01 Jan 2017
Proceedings of the Association for Information Science and Technology | VOL. 54

Document clustering analysis with aid of adaptive Jaro Winkler with Jellyfish search clustering algorithm
Perumal Pitchandi ... Mathivanan Balakrishnan
Advances in Engineering Software | VOL. 175
Perumal Pitchandi, et. al.Perumal Pitchandi ... Mathivanan Balakrishnan
01 Nov 2022
Advances in Engineering Software | VOL. 175

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

Abstract

Talk to us

Similar Papers

More From: Journal of Information Processing Systems