Abstract

With rapid and vast publishing of news over the Internet, there is a surge of interest to detect underlying hot events from online news streams. There are two main challenges in event detection: accuracy and scalability. In this paper, we propose a fast and efficient method to detect events in news websites. First, we identify bursty terms which suddenly appear in a lot of news documents. Then, we construct a novel co-occurrence graph between terms in which nodes and edges are weighted based on important features such as click and document frequency within burst intervals. Finally, a weighted community detection algorithm is used to cluster terms and find events. We also propose a couple of techniques to reduce the size of the graph. The results of our evaluations show that the proposed method yields a much higher precision and recall than past methods, such that their harmonic mean is improved by at least 40%. Moreover, it reduces the running time and memory usage by a factor of at least 2.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.