Automatic Detection of Shared Fragments in Large Collections of Web Pages and its Applications

Zhimin Gu,Junchang Ma

doi:10.1260/174830107781389003

Zhimin Gu, Junchang Ma

https://doi.org/10.1260/174830107781389003

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

To reduce network-related delays in serving dynamic web pages, various approaches have been proposed. However, one of the common fundamental problems encountered in some representatives of them is how to automatically find shared fragments in large numbers of web pages. Besides, this problem is also encountered in studies of web content characteristics at fragment granularity. This paper gives a formal definition of the problem, presents an efficient and scalable algorithm for it, and introduces the applications of the algorithm. In the problem definition, we introduce the notion of compound fragment, and our definition of maximal shared fragment captures the real characteristics of fragments that are appropriate for delivery and caching individually. Our algorithm has two unique features: (1) it is able to find real maximal shared fragments (2) it is able to effectively handle large collections of web pages by utilizing database techniques. The algorithm has been implemented and applied to 16 large sets of web pages. The experiments show that the algorithm can effectively handle large numbers of web pages, and can provide significant bandwidth saving and latency reduction when used in fragment-based web caching.

Full Text