Abstract

With the growth of cloud computing, resiliency of cloud is critical for enterprises’ business. However, the continuous-changing of cloud makes evaluation of cloud resiliency more difficult. In this study, we design a methodology for automatic and continuous evaluation of cloud resiliency and implement it in a tool called CRGauge. The Continuous Evaluation Model methodology leverages fault injection techniques to inject faults and an open-source library to set up synthetic workloads for the test campaign. Our experiment results on OpenStack cloud platform show that resiliency of OpenStack is needed to be improved especially in heavy workloads.

Highlights

  • Cloud environments play a critical role in delivering IT services to end users because cloud offers high resource utilization, fast convenient resource provisioning and deprovisioning, continuous management, maintenance and upgrade of machines transparent to end users and low overhead of the management

  • Fault injection is applied to measure system resiliency

  • This paper proposes a methodology that automatically measures cloud system resiliency to enable continuous evaluation of cloud resiliency

Read more

Summary

Introduction

Cloud environments play a critical role in delivering IT services to end users because cloud offers high resource utilization, fast convenient resource provisioning and deprovisioning, continuous management, maintenance and upgrade of machines transparent to end users and low overhead of the management. The deployment module leverages capabilities of the target cloud system to set up workloads, deploy the fault injection engine for certain types and scenarios of faults/failures, prepare the resiliency-computation models and do other configurations. In order to enable the evaluation in a continuous fashion, it is essential to automatically deploy the test environment, which includes the target cloud systems or services, workloads and setup of fault injection experiments. Certain parameters of the resiliency models are given in addition to the state transition diagrams of the target Cloud components’ resiliency (as exemplified in Fig. 2), including probability distributions of different types of failures and recovery time values for certain types of failures that are difficult to measure programmatically (e.g., when an error is only detected by relevant stuff, three-shift daily work schedule and oneshift-only work schedule result in different recovery time). Snapshot would be leveraged by Cleaner if the target cloud system is composed by virtual machines

Experimental Setup
Result and Analysis
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.