Abstract
Non-blocking caches, which are commonly utilized in modern out-of-order processors, could handle multiple outstanding memory requests simultaneously to reduce the penalties of long latency cache misses. Memory level parallelism (MLP), which refers to the number of memory requests concurrently held by Miss Status Handling Registers (MSHRs), is an indispensable factor to estimate cache performance. To achieve MLP efficiently, previous researches oversimplified the factors that need to be considered when constructing analytical models, especially for the influences of cache miss rate. By quantifying above cache miss rate effects, this paper proposes a mechanistic model of memory level parallelism, which performs more accurate than existing works. 15 benchmarks, chosen from Mobybench 2.0, Mibench 1.0 and MediaBench II, are adopted for evaluating the accuracy of our model. Compared to Gem5 cycle-accurate simulation results, the largest root mean square error is less than 11%, while the average one is around 7%. Meanwhile, the cache performance forecasting process can be sped up about 38 times compared to the Gem5 cycle-accurate simulations.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have