Improving Bank-Level Parallelism for In-Memory Checkpointing in Hybrid Memory Systems

Xiaofei Liao,Haikun Liu,Zhan Zhang,Hai Jin

doi:10.1109/tbdata.2018.2865964

Abstract

Checkpoint/recovery has been widely used in many high available and reliable systems. This paper proposes Shadow, an application-transparent and in-memory checkpointing mechanism based on hybrid memory system composed of DRAM and emerging Non-Volatile Memory (NVM). Shadow adopts a pre-copy based checkpointing mechanism to reduce the system downtime. It supports fine-grained and incremental checkpointing at frequencies up to 100 times per second. Under this context, the checkpointing can significantly degrade application performance due to memory contention between applications and the checkpointing process. Previous checkpointing mechanisms on hybrid memory systems have focused on the performance of checkpointing, and have overlooked the impact of memory contention on the application performance. In this paper, we mitigate the memory contention at the bank level by carefully scheduling memory requests to fully leverage the idle time slots of different memory banks. Moreover, if bank conflicts are unavoidable, Shadow promotes the priority of applications' memory requests to lessen their access latencies. By redesigning the memory controllers of DRAM and NVM, we implement a hardware-assisted checkpointing mechanism that can directly transfer data from working memory to the checkpoint in NVM, without any intervention of CPUs. Our evaluation shows that Shadow can reduce memory bank conflicts between applications and checkpointing by 75%, and decrease applications' memory read request latency by 28% on average compared to the pre-copy based checkpointing. Moreover, Shadow can also reduce checkpointing overhead by 42% and 16% on average compared to the stop-and-copy and pre-copy based checkpointing approaches, respectively.

Full Text