Abstract

K-means is a commonly used text clustering algorithm, the biggest advantage of the proposed algorithm is simple and fast, but due to the random selection of the initial cluster center point, the K-means algorithm is easy to fall into the local optimal algorithm and instability of the clustering results and the number of iterations. To solve this problem, this paper selected the initial cluster centers using hierarchical agglomerative clustering algorithm, to ensure the high quality of the center point; using cosine similarity to measure the distance between the text; reconstructed calculation formula of cluster center and the objective function of clustering quality. The experimental results show that the improved K-means algorithm has a relatively high accuracy and stability with the Sogou Chinese text corpus as the data set. Introduction

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call