Abstract

Erasure coding technology is one of the key technologies in big data storage system. A well designed erasure coding can not only improve the reliability of the big data storage system, but also greatly improve the performance. Most of the existing big data storage systems use replica strategy, which can provide good availability and real-time, but it has caused a lot of data redundancy and waste of storage space. A large part of the data stored in the storage system exists in the form of cold data. In this paper, we aim at the cold data which doesn’t require highly on data availability and real-time in the big data storage system. We have proposed a scheme to support both replica strategy and coding strategy, and designed the node scheduling and data addressing scheme. We selected Liberation code which is excellent in writing operation, and developed P-Schedule scheme to optimize the decoding speed. Through a series of designs, we can effectively improve the disk utilization and write speed of the cold data in the big data system. The test results show that the sequential write performance of erasure coding is better than that of the replica strategy. The larger the data block is, the better the performance is.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call