Abstract

In recent years, due to the explosive growth of social media information, mining hot information in social media has become a research direction of great concern. In this paper, Python crawler technology is used to crawl the semi-structured text data of food safety news from static web pages and dynamic web pages. After preprocessing, the structured text data required to establish a document clustering algorithm (CASC) based on a convolutional neural network is obtained. Using the feature extraction ability of convolutional neural network and self-encoder, while preserving the internal structure of the original data to the greatest extent, it is embedded into the low-dimensional potential space for clustering. Finally, it is compared with the performance of the K-means algorithm and spectral clustering algorithm. The experimental results show that the CASC algorithm reduces the running time and time complexity of the algorithm on the premise of ensuring clustering accuracy. The CASC algorithm is superior to the k-means algorithm and spectral clustering algorithm in precision, recall, and composite index. At the same time, the running time is 91 seconds faster than the K-Means algorithm and 5 seconds faster than the spectral clustering algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call