Abstract

Distributed storage systems usually adopt erasure codes to ensure the data reliability due to the better space efficiency and higher reliability. However, existing data insertion schemes for erasure codes are not appropriate for the continuous online data due to bottleneck of the centralized insertion schemes and the low efficiency of the decentralized insertion schemes. In this paper, we propose a pipelined online data insertion scheme based on the distributed storage systems with erasure codes, called POIS. For efficiency, we propose a distance-aware node selection technique to improve the transmission efficiency by selecting the nodes with the higher available bandwidth. Moreover, we propose a distributed data processing technique to maximize the encoding efficiency by pipelining the data transmission and distributing the encoding operations. For adaptivity, we propose a rollback-based failure processing technique to handle the node failure during the insertion process. To evaluate the performance of POIS, we conduct experiments on HDFS-RAID under various parameter settings on 200 virtual machines. Extensive experiments confirm that POIS improves the insertion throughput, adaptively adjusts the insertion process by handling the node failure during insertion and significantly outperforms the state-of-the art approaches under various parameter settings.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call