Abstract

Live migration of virtual machines is the ability to move running virtual machines between two computers with minimal downtime. Although various migration mechanisms such as pre-copy, post-copy, and state compression have been proposed, they may suffer long migration times when the migrating virtual machines run large computation and memory intensive workloads. This paper presents the design and implementation of a novel Time-bound, thread-based Live Migration (TLM) mechanism, where additional threads are added to the pre-copy live migration algorithm to handle virtual machine state transfers within a bounded time period. In the time-bound principle, the upper-bound migration time of a virtual machine is proportional to the size of the virtual machine's memory. We propose a CPU over-committing mechanism to minimize migration downtime and avoid performance impacts to other virtual machines when the migration threads are in operation. We have implemented a prototype implementation of TLM on KVM, and conducted experiments by migrating virtual machines running a number of Class D OpenMP and MPI NAS parallel benchmarks. Experimental results showed the following: (i) TLM finished live migration in a bounded time period. Users are able to measure progress of migration operation. (ii) The CPU over-committing mechanism can be used to minimize live migration downtime. However, communication performance of virtual machines during live migration also declined as the number of over-committed CPUs reduced. The patterns of decline depended on execution behaviors of the applications on the virtual machines. (iii) The execution time increases of the OpenMP and MPI versions of the MG and IS benchmarks in our experiments were approximately equal to the migration times of TLM. (iv) We evaluated our CPU over-committing mechanism against the auto-convergence mechanism recently developed in kvm-1.6. We found that both mechanisms have their pros and cons, and their performance results are varied with application. Based on these results, we believe that the TLM design is practical for live migration of virtual machines running memory-intensive workloads, and the time-bound principle is an important new feature for pre-copy live migration optimization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call