Abstract

To reduce network-related delays in serving dynamic web pages, various approaches have been proposed. However, one of the common fundamental problems encountered in some representatives of them is how to automatically find shared fragments in large numbers of web pages. Besides, this problem is also encountered in studies of web content characteristics at fragment granularity. This paper gives a formal definition of the problem, presents an efficient and scalable algorithm for it, and introduces the applications of the algorithm. In the problem definition, we introduce the notion of compound fragment, and our definition of maximal shared fragment captures the real characteristics of fragments that are appropriate for delivery and caching individually. Our algorithm has two unique features: (1) it is able to find real maximal shared fragments (2) it is able to effectively handle large collections of web pages by utilizing database techniques. The algorithm has been implemented and applied to 16 large sets of web pages. The experiments show that the algorithm can effectively handle large numbers of web pages, and can provide significant bandwidth saving and latency reduction when used in fragment-based web caching.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.