Abstract

Container-based cloud services that can achieve scalability at a low cost by dividing a complex system into instances, functions, or applications are essential for the operation of mission-critical industrial systems. Mission assurance and survivability are required as core elements and unique functions for these services to be the basic environment of major systems. In particular, a mission-critical system operating environment must guarantee service resilience that can provide stable services even in a situation where it is impossible because of cyberattacks or service failures. To solve this problem, we propose iContainer, which stands for Immortal Container. It provides stable services by quickly returning to the point desired by a user when a failure occurs by continuously recording container services. If efficient checkpoints are available, the lifecycles of containers are recorded and services are rolled back to a previous point desired by a user when a critical event occurs. iContainer has three contributions. First, it minimizes checkpointing operations through checkpoint zoning. We remove unnecessary checkpointing operations through a semantic-aware hot/cold container classification scheme for zones where data changes rarely occur. Second, rapid checkpointing is achieved through dirty-page tracking. We minimize checkpoint data (read/write) operations by efficiently tracking the memory area. Consequently, iContainer reduces the checkpoint execution time by 3.27 times compared to the conventional checkpointing scheme, and the size of the data generated by repetitive checkpointing is reduced by 69.2%. Third, iContainer includes rapid checkpoint restoration and flexible restore points. We designed the software-defined checkpoint/restore (SDCR) tool, which enables the rapid restoration and flexible selection of checkpoints and restore points. Experimental results show that it takes 337 ms on average from the detection of a service failure until stable service operation. Thus, the rollback time of SDCR is approximately 1.93 times faster than the conventional checkpointing tool, checkpoint/restore in userspace (CRIU). The experiment was conducted in an environment where a web service was operated. Moreover, iContainer can be utilized for service error restoration and as data for identifying the causes of accidents in the event of an attack or security accident because it records the lifecycle of containers through checkpoints.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call