Abstract

Infrastructure-as-a-Service container-based virtualization is gaining interest as a platform for running distributed applications. With increasing scale of Cloud architectures, faults are becoming a frequent occurrence, which makes availability a challenge. LXCloudFT is a fault tolerant Cloud system, which is composed of LXCloud-CR, a Checkpoint–Restart model and GC-CR, a garbage collector component that eliminates old snapshots of containers. LXCloudFT is designed, originally, for scientific applications and all its components are decentralized. We want to adapt it to serve stateless loosely coupled applications such as web applications. Replication is a method to survive failures for such applications. This paper addresses the issue of replication and contributes with a novel replication model, LXCloud-Rep, in LXCloudFT. LXCloud-Rep is a replication model with versioning and garbage collection, which is able to replicate Linux Container instances on several nodes in a decentralized manner. Following a node failure, LXCloud-Rep restarts failed containers on a new node from distributed images of containers not from snapshots. It optimizes the use of storage space. Large-scale experiments on Grid’5000 improve the performance of applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call