A Combined System Metrics Approach to Cloud Service Reliability Using Artificial Intelligence

Tek Raj Chhetri,Artjom Lind,Satish Narayana Srirama,Chinmaya Kumar Dehury,Anna Fensel

doi:10.3390/bdcc6010026

Tek Raj Chhetri, Artjom Lind + Show 3 more

Open Access

PDF Available

https://doi.org/10.3390/bdcc6010026

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (self-monitoring, analysis, and reporting technology) hard drive metrics with other system metrics, such as central processing unit (CPU) utilisation. Therefore, we propose a combined system metrics approach for failure prediction based on artificial intelligence to improve reliability. We tested over 100 cloud servers’ data and four artificial intelligence algorithms: random forest, gradient boosting, long short-term memory, and gated recurrent unit, and also performed correlation analysis. Our correlation analysis sheds light on the relationships that exist between system metrics and failure, and the experimental results demonstrate the advantages of combining system metrics, outperforming the state-of-the-art.

Highlights

In this study, taking advantage of the advancement in artificial intelligence (AI), we focus on failure prediction based on AI techniques of random forest (RF), gradient boosting (GB), long short-term memory (LSTM), and gated recurrent unit (GRU)
The aim of this study is to improve the reliability of cloud services by improving cloud server failure prediction, using selected AI techniques and the combined system metrics approach
We hypothesised that combining multiple system metrics would improve failure prediction, and in this paper, we present our research on cloud server failure prediction

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cloud computing has emerged as the fifth utility over the last decade, and is a backbone to the modern economy [1]. It is a model of computing that allows flexible use of virtual servers, massive scalability, and management services for the delivery of information services. With the low-cost pay-per-use model of on-demand computing [2], the cloud has grown massively over the years, both in terms of size and complexity

Objectives

Methods

Conclusion