Based on the Correlation of the File Dynamic Replication Strategy in Multi-Tier Data Grid

Zhongqiang Cui,Decheng Zuo,Zhan Zhang

doi:10.14257/ijdta.2015.8.1.09

Abstract

The data grid is one of the infrastructures for data and storage resource management. Data replication can improve data access performance, fault tolerance, and reduce network transmission bandwidth. As the site storage space is limited in the grid, how to effectively use grid resources become an important challenge. This paper introduces a dynamic grid replication algorithm based on popularity support and confidence (BPSC). Through the algorithm, data and its associated copy can be placed on a suitable site, with reduced access latency. Through Optorsim, simulation results show that the algorithm can provide better performance compared with other algorithms Grid computing is an important branch of distributed computing. Grid computing constructs a virtual computer system through using a different location of the large number of heterogeneous resources (CPU and storage), provides the model to solve the problem of large scale computing and storage. The data grid is mainly the grid for data storage management. The data grid is defined as an effective combination of data and computing resources of distributed systems by Data Grid Group. The large-scale data- intensive applications, such as bioinformatics, high energy physics, space information generated vast amounts of data, these data may be TB scale, even up to the scale of PB. It is not practical to maintain large scale data at each site, and provide computing resources for visitors. At the same time, it will lead to increased access latency through a centralized way to storing and accessing these data from each site. Data replication technology is an effective method to reduce access latency in the data grid system. Data replication is one of the basic methods of distributed systems to improve the efficiency of system access, maintenance, system high availability. Data replication is to store data in the appropriate site for users to effective access to massive data. Data replication can reduce data access latency, reduce network bandwidth consumption, avoid congestion, adjust the load balancing in server-side, improve system reliability. Many data replication algorithm can reduce data access latency, improve system reliability, but which data will be copied, when the data will be copied at, where the data will be copied are the basic means of determining data replication algorithm. In this paper, we introduced a real-time dynamic data replication algorithm. The algorithm is based on the popularity of file dynamic to replicate data, and consider the storage space limitations of the site. Through optorsim simulation verified the algorithm. This paper is organized as follows: Section 2 is about related works on replication in data grids. In Section 3, system model and component in system are presented. The file

Full Text