In general, a detailed modeling and evaluation of computer architectures make a cycle-accurate simulator necessary. As the architectures become increasingly complex for parallel, cloud, and neural computing, nowadays, the complexity of the simulator grows rapidly, and thus its execution is too slow or infeasible for practical use. In order to alleviate the problem, many previous studies have focused on reducing the simulation time in a variety of ways such as using sampling methods, adding hardware accelerators, and so on. In this paper, we propose a new parallel simulation framework, called Epoch-based Parallel SIMulator, to obtain scalable speedup with large number of cores. The framework is based on a well-known cycle-accurate full-system simulator, MARSSx86. From the simulator, we build an epoch, that is an execution interval, where the architectural simulation by PTLSim does not involve any interaction with QEMU. Therefore, we can simulate epochs independently, i.e., execute multiple epochs completely in parallel by PTLSim with their live-in data. Our performance evaluation shows that we achieve $12.8\times $ speed on average with 16-core parallel simulation from the SPEC CPU2006 benchmarks and the PARSEC benchmarks, providing the performance scalability.
Read full abstract