Abstract

With the widespread use of distributed systems and the development of big data technology, the storage redundancy, data availability and persistence of distributed systems have become a concern. Traditional multi-copy storage may cause large storage redundancy, thus wasting storage space. The erasure code is considered as the best alternative to the replica strategy. However, the existing methods do not fully consider the access characteristics of files, so that the data with high access popularity are stored using computationally complex erasure codes, resulting in poor parallel read and write performance. Moreover, due to insufficient consideration of file size, storage redundancy cannot be minimized, thus wasting storage space. Therefore, we propose an adaptive hybrid storage strategy based on file access characteristics called FACHS. For cold files, namely the files with low access frequency, we use RS Code (Reed Solomon Code) to store them. RS Code has low computing and storage costs. We use multi-copy and LRC code (local reconstruction code) to store small and large size hot files respectively. Multi-copy ensures the efficiency of file read/write, as well as the ability of parallel read/write. The recovery cost of LRC code in case of node failure is very low. The experimental results show that compared with the existing methods, FACHS can reduce the storage space occupation by 12% for cold files, and improve the read/write speed by 8% and the recovery efficiency by 29% for hot files.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call