Abstract

The rapid growth of the World Wide Web is increasing the demand for efficient distribution and fast access of information around the world. As hardware upgrades are not always able to keep up with increase in web traffic, considerable latency is often experienced in retrieving web objects from the Internet during peak periods. Therefore, better content distribution techniques in addition to hardware upgrades are needed to improve the performance of web accesses. Caching and replication are two primary content distribution approaches to enhance web performance. This thesis investigates four related issues in web caching and replication. In the first part of the thesis, we study cache cooperation strategies at different levels of the Internet hierarchy. At the upper level (e.g., that of a regional ISP), web caches are geographically distributed and often structured in a cascaded fashion where requests not hitting a lower level cache are forwarded to a higher level cache. We present a general analytical framework for coordinated management of cascaded caches. The optimal locations for caching objects are computed by a dynamic programming algorithm. Based on the framework, a novel caching scheme that integrates both object placement and replacement strategies is proposed. At the lower level of the Internet hierarchy (e.g., that of a client organization), web caches are geographically clustered. Hash routing is an effective coordination technique to improve the overall hit ratio and reduce the outbound traffic for clustered caches. We have developed an analytical model for hash routing and investigated the problem of determining the optimal object and DNS allocation strategies to minimize the average response time of client requests. The analytical results are applied to the design of two adaptive hash routing schemes for static and dynamic client configurations respectively. An important price to pay for web replication is that the content provider needs to keep the replicas consistent with the authoritative origin copy. In the second part of the thesis, we study the optimal cost of consistency management for two different scenarios. In the first scenario, the construction of a distribution tree is investigated with the objective of minimizing the cost of expiration-based consistency management for geographically distributed replicas. This is formulated as an optimization problem and is proven to be NP-hard. The optimal distribution tree is identified in some special cases and three heuristic algorithms are proposed for the general problem. In the second scenario, the replication of dynamic web contents is studied. Unlike static web contents which only involve file fetches when they are requested, dynamic contents are constructed by running application programs on base data. Therefore, the cost of consistency management includes both the network cost of update transfers and the computation cost of object reconstruction. We present a theoretical framework on minimal cost replication of dynamic web contents under a flat model of update delivery. A polynomial optimal solution is proposed that determines where each data object should be replicated and how to keep the replicas up-to-date.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call