Abstract

AbstractShort text clustering is an unsupervised learning technique for pattern discovery and analysis of short text datasets, which has been applied to many scenarios such as business risk control and audit. With the development of digitalization over the last few years, the data scale in various scenarios has increased rapidly. Traditional short text clustering methods such as K-means face many challenges in large-scale data analysis, such as difficult to preset hyperparameters and high computational complexity. To alleviate this problem, we propose a novel clustering algorithm called Word Hash clustering algorithm (W-Hash) for Chinese short text analysis. Specifically, W-Hash does not require a pre-specified number of clusters, and it has much lower computational complexity than the traditional clustering approaches. To verify the effectiveness of W-Hash, we apply it to solve a real-life business audit problem. The corresponding experimental results show that W-Hash outperforms traditional clustering algorithms in both training time and result rationality.KeywordsShort text clusteringClusteringK-meansBusiness audit

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.