Efficient reuse of replicated parallel data segments in computational grids

Sandip Tikar,Sathish Vadhiyar

doi:10.1016/j.future.2008.01.001

Abstract

Grids are being used for executing parallel applications over remote resources. For executing a parallel application on a set of grid resources chosen by a user or a grid scheduler, the input data needed by the application is segmented according to the data distribution followed in the application and the data segments are distributed to the grid resources. The same input data may be used subsequently by different applications leading to multiple copies (replicas) of parallel data segments in various grid resources. The data needed for a parallel application can be gathered from the existing replicas onto the computational resources chosen by the grid scheduler for application execution. In this work, we have devised novel algorithms for determining “nearest” replica sites containing data segments needed by a parallel application executing on a set of resources with the objective of minimizing the time needed for transferring the data segments from the replica sites to the resources. We have tested our algorithms on different kinds of experimental setups. We find that the best algorithm varies according to the configuration of data servers and clients. In all cases, our algorithms performed better than the existing algorithms by at least 15%.

Full Text