Abstract

Wide-area Virtual-Machine (VM) live migration can serve as a disaster-recovery solution for IT services by moving virtualized servers to safe locations upon a critical disaster. In this scenario, it is desirable to evacuate as many VMs as possible under limited and changing electrical power and network conditions. The challenges are 1) when migrating many VMs simultaneously, the migration time of each individual VM increases, resulting in high probability of migration failures due to power or network failures, 2) the sequential migration of VMs may not efficiently use the network, and 3) network conditions, such as available bandwidth and congestion, fluctuate over time. There is a need to solve a multi-objective problem that aims at reducing simultaneously the total migration time and individual migration times. In this paper, we focus on precopy migration and present 1) the design and implementation of a feedback-based control system that manages VM migrations of multiple servers and tackles the aforementioned challenges, 2) valuable findings from extensive experiments and 3) a metric to evaluate the migration performance that takes into account both the total and individual migration times. The proposed system monitors the network usage of hosts, adjusts migration parameters, and coordinates the migration scheduling of VMs. It is a promising approach to efficiently transfer IT services from a damaged data enter to a fully functional one by automatically managing migrations across data enters. Experiments are conducted with several combinations of parameters including network conditions, migration strategies, controller type, memory distribution, and live/offline VM migrations. The results show 1) the usefulness of a feedback-based controller with a global view that can coordinate multiple physical machines to efficiently use network resources and reduce migration times, 2) the factors that affect the migration performance of multiple hosts, 3) the potential of improving sequential VM migration by integrating support for parallel TCP connections, and 4) near-optimal operating point is found while balancing both the total migration time and individual migration times by using the proposed control system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call