W-Hash: A Novel Word Hash Clustering Algorithm for Large-Scale Chinese Short Text Analysis

Yaofeng Chen,Weipeng Cao,Chunyang Zhang,Meikang Qiu,Long Ye,Xiaogang Peng

doi:10.1007/978-3-031-10989-8_42

Abstract

AbstractShort text clustering is an unsupervised learning technique for pattern discovery and analysis of short text datasets, which has been applied to many scenarios such as business risk control and audit. With the development of digitalization over the last few years, the data scale in various scenarios has increased rapidly. Traditional short text clustering methods such as K-means face many challenges in large-scale data analysis, such as difficult to preset hyperparameters and high computational complexity. To alleviate this problem, we propose a novel clustering algorithm called Word Hash clustering algorithm (W-Hash) for Chinese short text analysis. Specifically, W-Hash does not require a pre-specified number of clusters, and it has much lower computational complexity than the traditional clustering approaches. To verify the effectiveness of W-Hash, we apply it to solve a real-life business audit problem. The corresponding experimental results show that W-Hash outperforms traditional clustering algorithms in both training time and result rationality.KeywordsShort text clusteringClusteringK-meansBusiness audit

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

W-Hash: A Novel Word Hash Clustering Algorithm for Large-Scale Chinese Short Text Analysis

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A clustering algorithm based on maximum entropy principle
Yang Zhao ... Fangai Liu
Journal of Physics: Conference Series | VOL. 887
Yang Zhao, et. al.Yang Zhao ... Fangai Liu
01 Aug 2017
Journal of Physics: Conference Series | VOL. 887

Research of text clustering based on improved VSM by TF under the framework of Mahout
Cao Langcai ... Li Zhihui
-
Cao Langcai, et. al.Cao Langcai ... Li Zhihui
01 May 2017
01 May 2017

Research of text clustering based on fuzzy granular computing
Zhang Xia ... Zhao Hailong
-
Zhang Xia, et. al. Zhang Xia ... Zhao Hailong
01 Jan 2009
01 Jan 2009

A Novel Approch for Clustering of Chinese Text Based on Concept Hierarchy
Peng Zhao ... Huan-Tong Geng
-
Peng Zhao, et. al.Peng Zhao ... Huan-Tong Geng
01 Jul 2006
01 Jul 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

W-Hash: A Novel Word Hash Clustering Algorithm for Large-Scale Chinese Short Text Analysis

Abstract

Talk to us

Similar Papers