Abstract

Fault tolerance has become increasingly critical for virtualized systems as growing amount of mission-critical applications are now deployed on virtual machines rather than directly on physical machines. However, prior hardware-based fault-tolerant systems require extensive modification to existing hardware, which makes them infeasible for industry practitioners. Although software-based techniques realize fault tolerance without any hardware modification, they suffer from significant latency overhead that is often orders of magnitude higher than acceptable. To realize practical low-latency fault tolerance in the virtualized environment, we first identify two bottlenecks in prior approaches, namely the overhead for tracking dirty pages in software and the long sequential dependency in checkpointing system states. To address these bottlenecks, we design a novel mechanism to asynchronously prefetch the dirty pages without disrupting the primary VM execution to shorten the sequential dependency. We then develop Phantasy, a system that leverages page-modification logging (PML) technology available on commodity processors to reduce the dirty page tracking overhead and asynchronously prefetches dirty pages through direct remote memory access via RDMA. Evaluated on 25 real-world applications, we demonstrate Phantasy can significantly reduce the performance overhead by 38 percent on average, and further reduce the latency by 85 percent compared to a state-of-the-art virtualization-based fault-tolerant system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call