AN ANALYSIS OF PROCEDURES AND OBJECTIVE FUNCTIONS FOR HEURISTICS TO PERFORM DATA STAGING IN DISTRIBUTED SYSTEMS

Mitchell D Theys,Howard Jay Siegel,Noah B Beck,Michael Jurczyk

doi:10.1142/s0219265906001703

Abstract

The data staging problem involves positioning data within a distributed heterogeneous computing environment such that programs can access the requested data faster. This problem exists because applications constantly need up-to-date information to enable users to make decisions. In addition, these requests for information are normally occurring in an oversubscribed network. In such a heterogeneous distributed computing environment, each data storage location and intermediate node may have different data available, storage limitations, and communication links available. Sites in the heterogeneous network request data items and each request has an associated deadline and priority. This work extends the research presented in [ThT00] where a basic version of the data staging problem with static information was presented. This work introduces three new cost criteria and two new bounds on performance that were designed taking into account results from [ThT00]. A subset of the possible procedure/cost criterion combinations are evaluated in simulation studies considering a different priority weighting scheme, different average number of links used to satisfy each data request, and different network loadings, than was considered in [ThT00]. This paper also introduces a variable time, variable accuracy approach for using data items with "more desirable" and "less desirable" versions.

Full Text