Optimizing pre-copy live virtual machine migration in cloud computing using machine learning-based prediction model

Raseena M Haris,Mahmoud Barhamgi,Armstrong Nhlabatsi,Khaled M Khan

doi:10.1007/s00607-024-01318-6

Abstract

AbstractOne of the preconditions for efficient cloud computing services is the continuous availability of services to clients. However, there are various reasons for temporary service unavailability due to routine maintenance, load balancing, cyber-attacks, power management, fault tolerance, emergency incident response, and resource usage. Live Virtual Machine Migration (LVM) is an option to address service unavailability by moving virtual machines between hosts without disrupting running services. Pre-copy memory migration is a common LVM approach used in cloud systems, but it faces challenges due to the high rate of frequently updated memory pages known as dirty pages. Transferring these dirty pages during pre-copy migration prolongs the overall migration time. If there are large numbers of remaining memory pages after a predefined iteration of page transfer, the stop-and-copy phase is initiated, which significantly increases downtime and negatively impacts service availability. To mitigate this issue, we introduce a prediction-based approach that optimizes the migration process by dynamically halting the iteration phase when the predicted downtime falls below a predefined threshold. Our proposed machine learning method was rigorously evaluated through experiments conducted on a dedicated testbed using KVM/QEMU technology, involving different VM sizes and memory-intensive workloads. A comparative analysis against proposed pre-copy methods and default migration approach reveals a remarkable improvement, with an average 64.91% reduction in downtime for different RAM configurations in high-write-intensive workloads, along with an average reduction in total migration time of approximately 85.81%. These findings underscore the practical advantages of our method in reducing service disruptions during live virtual machine migration in cloud systems.

Full Text