Abstract

<p>Aiming at the problems that traditional K-means text clustering cannot automatically determine the number of clusters and is sensitive to initial cluster centers, this paper proposes a Canopy-MMD text clustering algorithm based on simulated annealing and silhouette coefficient optimization. The algorithm uses the simulated annealing algorithm combined with the silhouette coefficient to optimize the Canopy algorithm to find the optimal number of clusters, and uses the optimal number of clusters to determine the scale coefficient in the MMD algorithm, and finally achieves a better text clustering effect. The Sohu News dataset of Sogou Lab is experimentally analyzed and compared with the clustering results obtained by traditional K-means and algorithms in the literature. The experimental results show that the clustering performance of the algorithm is better than the traditional K-means algorithm and the algorithm in the literature, and the accuracy, precision, recall and F value are improved by 8.02%, 8.91%, 8.02%, 9.51% compared with the traditional K-means algorithm, which can be widely used in fields such as text mining, knowledge graph and natural language processing.</p> <p> </p>

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.