Weighted-Tuple: Fast and Accurate Synchronization for Parallel Architecture Simulators

Michael Moeng,Alex K Jones,Rami G Melhem

doi:10.1109/tpds.2015.2494589

Abstract

Computer architecture research relies on software simulation to evaluate processor performance. Single-threaded simulators have unacceptable simulation times when modeling complex architectures with hundreds of cores. While parallelizing a simulator can improve performance, parallel simulators face the issue of synchronizing threads, which forces them to trade performance for accuracy. We study relaxed synchronization policies for parallel architecture simulators and introduce the weighted-tuple synchronization policy. Weighted-tuple is a distributed synchronization scheme which improves upon existing policies. We evaluate weighted-tuple for two parallel simulator settings: multicore simulation and network-on-chip simulation. For the multicore setting using weighted-tuple synchronization, average simulation time is reduced by $8$ percent over barrier synchronization; error is also reduced by $28$ percent. For network-on-chip simulation, weighted-tuple synchronization improves simulation speed by $42$ percent with an $0.3$ percent error increase compared to the barrier baseline.

Full Text