A popularity-aware reconstruction technique in erasure-coded storage systems

Ting Cao,Xiaopu Peng,Chaowei Zhang,Taha Khalid Al Tekreeti,Jianzhou Mao,Xiao Qin,Jianzhong Huang

doi:10.1016/j.jpdc.2020.08.003

Abstract

In this study, we develop a novel data reconstruction technique for parallel storage systems housed in modern data centers. We advocate for erasure-coded data storage systems to archive warm data (a.k.a., unpopular data), which attract a limited number of accesses or updates. Different from hot or cold data, warm data have to be treated in a distinctive way to optimize system performance and storage-space utilization. We pay particular attention to efficient data reconstruction in which faulty data nodes are rebuilt while responding to I/O requests. To achieve this goal, we employ two machine-learning algorithms to offer online data reconstruction in erasure coded storage systems. Our data reconstruction technique is conducive to recovering faulty nodes while boosting read performance for requests accessing data residing on the faulty nodes. Our system is reliant on a clustering mechanism to group files into multiple clusters, in each of which files share similar features. Furthermore, we implement a prediction module where a list of future popular data is projected by keeping track of historical I/O accesses. This popular-data list, in turn, provides predictions on files that are likely to be accessed in the not-too-distant future. The prediction module is responsible for computing similarities among users, thereby setting up priority levels of data blocks to be reconstructed. We implement our data reconstruction scheme in an erasure-coded parallel storage system to recover files with a guidance from the popular-data list. Our experimental results confirm that our system speeds up the data recovery of parallel storage systems while maintaining a high data access performance for on-line users.

Full Text