With the rapid development of Internet technology, the influence of online consensus continues to expand. How to quickly and effectively discover sensitive topics and keep track of those topics has become an important research recently. Text clustering can aggregate news texts with the same or similar content to achieve the purpose of discovering topics automatically. Make improvement to clustering algorithm according to different media types is the main research direction. Although the existing typical clustering algorithms have certain advantages, they all face constraints on data size and data characteristics in specific applications. There is no existing algorithm can fully adapt to these characteristics. Although the application of more Single-pass algorithms in the (TDT) field can realize the discovery and tracking of topics, there are disadvantages of poor accuracy and slow speed under massive data. According to the dynamic evolution characteristics of online consensus, this paper proposes an incremental text clustering algorithm based on Single-pass, which optimizes the clustering accuracy and efficiency of massive news. Based on the real online news texts from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meets the real-time demand of online topic detection and has a certain practical value.
Read full abstract