Abstract

With the development of the Internet, the amount of information present on the network has grown rapidly, leading to increased difficulty in obtaining effective information. Especially for individuals, enterprises, and institutions with a large amount of information, it is an almost impossible task to integrate and analyze Internet information with great difficulty just by human resources. Internet hot events mining and analysis technology can effectively solve the above problems by alleviating information overload, integrating redundant information, and refining core information. In this paper, we address the above problems and research hot event topic sentence generation techniques in the field of hot event mining and design a hybrid event candidate set construction algorithm based on topic core word mapping and event triad selection. The algorithm uses the PAT-Tree technique to extract high-frequency core words in topic hotspots and maps the high-frequency words into sentences to generate a part of event core sentences. The other part of event core sentences is extracted from the topic hotspots by making event triples as candidate elements, and sentences containing event elements are extracted from the topic hotspots. The sets of event core sentences generated by the two methods are mixed and filtered and sorted to obtain the candidate set, which can be used to build a word graph-based main service channel (MSC) model. In this paper, we also propose an improved word graph-based MSC model and use it for the extraction of event topic sentences. Based on the above research, a hot event analysis system is implemented. The system analyzes the existing topic data and uses the event topic sentence generation algorithm studied in this paper to generate the titles of hot spots, that is, hot events. At the same time, the topics are displayed from different dimensions, and data visualization is completed. The visualization includes the trend change of event hotness, trend change of event sentiment polarity, and distribution of event article sources.

Highlights

  • Internet hot events mining and analysis technology can effectively solve the above problems by alleviating information overload, integrating redundant information, and extracting core information

  • Most of the current research on hot events is mostly based on clustering technology for hot topic discovery, but topics cannot be equated with events, which are aggregated from multiple hotspots describing the same events, while events are a phrase that can highly summarize the main content of a topic, and we can understand events as the title of a topic. e research of web hot events mining technology involves natural language processing techniques, such as topic detection and tracking (TDT), hotspot clustering technology, and title generation technology [1]. is technology is to mine information that is valuable to people according to specific needs from big data information. e

  • We analyzed the 98 event topic sentences generated by the three algorithms and found that the baseline method often produced “off-topic” event topic sentences; that is, the generated event topic sentences did not represent the topic of the event, but the topic sentences performed better in terms of linguistic coherence compared with the information content score. e reason for the low accuracy of the baseline algorithm is that the algorithm uses the conditional probabilities of the words provided by the language model to obtain the highest scoring sequence of words, which counts the conditional probabilities between words; that is, the higher the probability is, the easier it is for two words to be used together, so the linguistic coherence is guaranteed

Read more

Summary

Research Article

Received 29 October 2021; Revised 19 November 2021; Accepted 4 December 2021; Published 17 December 2021. Internet hot events mining and analysis technology can effectively solve the above problems by alleviating information overload, integrating redundant information, and refining core information. We address the above problems and research hot event topic sentence generation techniques in the field of hot event mining and design a hybrid event candidate set construction algorithm based on topic core word mapping and event triad selection. E sets of event core sentences generated by the two methods are mixed and filtered and sorted to obtain the candidate set, which can be used to build a word graph-based main service channel (MSC) model. We propose an improved word graph-based MSC model and use it for the extraction of event topic sentences. The topics are displayed from different dimensions, and data visualization is completed. e visualization includes the trend change of event hotness, trend change of event sentiment polarity, and distribution of event article sources

Introduction
Mobile phone
Qualified logo
Articles total Max Min
External index
Outliers Heterogeneous clustering Noise points
Time spent language coherence mount of information
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.