Latent sector errors (LSEs) in disk drives cause significant outages, data loss, and unreliability in large-scale cloud storage systems, posing not only technical challenges but also environmental concerns in the context of carbon recycling. Predicting LSEs can help avoid these problems and improve cloud reliability, while also contributing to a more sustainable cloud infrastructure. Ensemble classifiers typically outperform individual classifiers for LSE prediction with high accuracy but can lead to underfitting and incurring additional computational cost, complexity, and time and memory consumption. This research addresses this challenge by proposing a twofold solution: optimizing the ensemble diversity of the resulting Random Forest (RF) classifier through accuracy sliding window-based ensemble pruning (SWEP-RF) and using this pruned ensemble to predict LSEs in cloud storage. By effectively predicting and mitigating LSEs, this approach reduces unnecessary energy consumption and carbon emissions associated with data recovery and reprocessing, aligning with carbon recycling goals. SWEP-RF maximizes its lower margin distribution to adapt the RF prediction performance and produce a strong-performing and effective subensemble, further enhancing the overall energy efficiency of cloud systems. Our approach also reduces ensemble size while maintaining high prediction accuracy, leading to more sustainable resource utilization. We evaluate our algorithm using datasets from Baidu Inc and Backblaze datacenters. Experimental results demonstrate that our approach achieves over 98.6% prediction accuracy, a low false alarm rate (FAR) of 0.003%, and extended meantime to data loss (MTTDL) with lead time in advance (LTA) of up to 383.4 Hrs. and 474.3 Hrs., respectively. SWEP-RF outperforms classical models and current state-of-the-art techniques in prediction accuracy, FAR, MTTDL, processing time, memory consumption, and cloud availability, highlighting its significance in not only enhancing cloud storage reliability but also reducing the carbon footprint of cloud services. Our method is a promising solution for enhancing cloud storage reliability through proactive LSE prediction, while addressing the urgent need for sustainable practices and carbon recycling in cloud computing.
Read full abstract