A Genetic Algorithm Based Data Replica Placement Strategy for Scientific Applications in Clouds

Lizhen Cui,Dong Yuan,Junhua Zhang,Lingxi Yue,Hui Li,Yuliang Shi

doi:10.1109/tsc.2015.2481421

Abstract

Cloud computing is a promising distributed computing platform for big data applications, e.g., scientific applications, since excessive resources can be obtained from cloud services for processing and storing both existing and generated application datasets. However, when tasks process big data stored in distributed data centers, the inevitable data movements will cause huge bandwidth cost and execution delay. In this paper, we construct a tripartite graph based model to formulate the data replica placement problem and propose a genetic algorithm based data replica placement strategy for scientific applications to reduce data transmissions in cloud. Our approach can reduce 1) the size of moved data, 2) the time of data movement and 3) the number of movements. We conduct experiments to compare the proposed strategy with the random placement strategy used in Hadoop Distributed Files System (HDFS), which demonstrates that our strategy has better performance for scientific applications in clouds.

Full Text