Abstract

When an event occurs in the real world, numerous news reports describing this event start to appear on different news sites within a few minutes of the event occurrence. This may result in a huge amount of information for users, and automated processes may be required to help manage this information. In this paper, we describe a clustering system that can cluster news reports from disparate sources into event-centric clusters—i.e., clusters of news reports describing the same event. A user can identify any RSS feed as a source of news he/she would like to receive and our clustering system can cluster reports received from the separate RSS feeds as they arrive without knowing the number of clusters in advance. Our clustering system was designed to function well in an online incremental environment. In evaluating our system, we found that our system is very good in performing fine-grained clustering, but performs rather poorly when performing coarser-grained clustering.

Highlights

  • Within some time of an event happening in the real world, numerous news reports will appear on news sites on the World Wide Web that describe that event

  • When analysing the results obtained from the evaluation of our clustering system, one can notice that our system attains very poor results when clustering the Reuters-RCV1 corpus

  • The clustering is performed by representing incoming news reports as

Read more

Summary

Introduction

Within some time of an event happening in the real world, numerous news reports will appear on news sites on the World Wide Web that describe that event. Huge quantities of information presented to user may lead to confusion on the side of the user, and some knowledge contained within this information may remain hidden since the user may not have time to go through the entire corpus of information to get each nugget of information, or may miss it as it lies “buried” in familiar information Automated processes such as Document Fusion and Recommendation systems can be used to assist the user in his/her quest for knowledge discovery. The tasks of such automated systems may be rendered simpler by having a mechanism that clusters news stories together by specific events (e.g., news reports on the murders in Norway by Anders Breivik). There is a growing need for techniques to handle this increasing flood of incoming data and avoid delay in its distribution [6,9]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.