Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor

Frederico Cerveira,Henrique Madeira,Raul Barbosa

doi:10.1109/access.2021.3098644

Frederico Cerveira, Henrique Madeira + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3098644

Copy DOI

Abstract

Many organizations are moving their systems to the cloud, where providers consolidate multiple clients using virtualization, which creates challenges to business-critical applications. Research has shown that hypervisors fail, often causing common-mode failures that may abruptly disrupt dozens of virtual machines simultaneously. We hypothesize and empirically show that a significant percentage of virtual machines affected by a hypervisor failure are capable of continuing execution on a new hypervisor. Supported by this observation, we design a technique for recovering from hypervisor failures through efficient virtual machine migration to a co-located hypervisor, which allows virtual machines to continue executing with minimal downtime and which can be transparently applied to existing applications. We evaluate a proof-of-concept implementation using fault injection of hardware and software faults and show that it can recover, on average, 41-46% of all virtual machines, as well as having a mean virtual machine downtime of 3 seconds.

Highlights

Cloud computing infrastructures provide elastic resources to organizations, enabling them to deploy scalable online applications and services while reducing the fixed costs of IT infrastructures [1]
Fault injection experiments presented in this paper show that our hypothesis holds and suggest that virtual machines (VMs) can be recovered after hypervisor failures
The experiments measure recovery effectiveness, migration time, downtime and runtime overhead

Summary

Introduction

Cloud computing infrastructures provide elastic resources to organizations, enabling them to deploy scalable online applications and services while reducing the fixed costs of IT infrastructures [1]. Virtualization is one of the enabling technologies supporting cloud computing initiatives. Cloud providers rent their physical infrastructure to multiple tenants, using virtualization to execute up to hundreds of virtual machines (VMs) on a single, powerful physical machine [4]. This is a very cost-effective approach, it creates the risk of common-mode failures [5], which have been observed in

Objectives

Methods

Results

Conclusion