Abstract

This paper presents WebCQ, a continual query system for large-scale Web information monitoring. WebCQ is designed to discover and detect changes to Web pages efficiently, and to notify users of interesting changes with personalized messages. Users' Web page monitoring requests are modeled as continual queries on the Web and referred to as Web page sentinels. The system consists of five main components: a change detection robot that discovers and detects changes, a proxy cache service that reduces the communication traffics to the original information provider on the remote server, a trigger evaluation tool that can filter only the changes that match certain thresholds, a personalized change presentation tool that highlights Web page changes, and a change notification service that displays and delivers interesting changes and fresh information to the right users at the right time. This paper describes the WebCQ system with an emphasis on the general issues in designing and engineering a large-scale information change monitoring system on the Web. There are two main contributions. First, we present the mechanisms that WebCQ provides to support various types of Web page sentinels for finding and displaying interesting changes to Web pages. The large collection of sentinel types allows WebCQ to efficiently locate and monitor a wide range of changes in Web pages. The second contribution is the development of sentinel grouping techniques for efficient and scalable processing of large number of concurrently running triggers and Web page sentinels. We report our initial experimental results showing the effectiveness of the proposed solutions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.