The Time Warp synchronization protocol for Parallel Discrete Event Simulation (PDES) is universally considered a viable solution to exploit the intrinsic simulation model parallelism and to provide model execution speedup. Yet it leads the PDES system to execute events in an order that may generate causal inconsistencies that need to be recovered via rollback , which requires restoration of a previous (consistent) simulation state whenever a causality violation is detected. The rollback operation is so critical for the performance of a Time Warp system that it has been extensively studied in the literature for decades to find approaches suitable to optimize it. The proposed solutions can be roughly classified as based on either checkpointing or reverse computing . In this article, we explore the practical design and implementation of a fully new approach based on the runtime generation of so-called undo code blocks , which are blocks of instructions implementing the reverse memory side effects generated by the forward execution of the events. However, this is not done by recomputing the original values to be restored, as instead it occurs in reverse computing schemes. Hence, the philosophy undo code blocks rely on is similar in spirit to that of undo-logs (as a form of checkpointing). Nevertheless, they are not data logs (as instead checkpoints are); rather, they are logs of instructions. Our proposal is fully transparent, thanks to the reliance on static software instrumentation (targeting the x86 architecture and Linux systems). Also, as we show, it can be combined with classical checkpointing to further improve the runtime behavior of the state recoverability support as a function of the workload. We also present experimental results related to our implementation, which is released as free software and fully integrated into the open source ROOT-Sim package. Experimental data support the viability and effectiveness of our proposal.
Read full abstract