Abstract

In out-of-order processors, non-blocking caches are adopted to further explore the memory performance by overlapping the service time of multiple cache misses. As an important performance metric, Memory Level Parallelism (MLP) describes how many cache misses can be serviced concurrently. Prior studies on MLP modeling are either based on mechanistic analyses without considering the cache miss behavior or empirical fittings with bare insights. According to our experimental results, however, we find that MLP is highly dependent on the cache miss patterns. Based on this observation, this paper proposes an analytical model which estimates MLP more accurately and efficiently. Similar to previous approaches, the input of our model is the statistical information of executed software that can be profiled easily from the memory traces. Eleven benchmarks from Mobybench suite are chosen to evaluate the model with different cache sizes. Comparing to the results from gem5 cycle-accurate simulations, the mean absolute error of our model is around 3.4%, which is significantly lower than Jian Chen's 18% and Qin Wang's 7%. Additionally, our model can bring 35 times speed up relative to cycle-accurate simulations, which is similar to the time overhead of Jian Chen's and Qin Wang's models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call