Abstract

Multiprocessor systems make use of multilevel cache hierarchies to improve overall memory access speed. Embedded systems typically use configurable processors, where the caches in the system can be customized for a given application or a set of applications. Finding the optimal or a near-optimal set size, block size, and associativity of each of the caches in a multilevel cache hierarchy is a challenging task due to the presence of billions or even trillions of design points. This paper presents an iterative exploration method to find suitable configurations for all the caches in the hierarchy of an application specific multiprocessor system-on-chip, to improve memory access speed. We propose an algorithm and combine it with the use of specialized hardware for parallel cache simulation to enable multiple back-and-forth iterations through the cache levels. In every iteration, our algorithm explores selected portions of the entire design space to quickly converge upon the final design point. We demonstrate our methodology on two- and three-level cache hierarchies with private and shared caches in a quad-core system, respectively, consisting of 5.4 billion and 10.4 trillion design points. Our method was able to find design points with up to 18.9% lower average memory access time while reducing total cache size by up to 74.15%, compared to a state-of-the-art noniterative method. The number of design points explored was $ {4\times }$ higher in our method, which is still a mere $ {3.6\times 10}^{ {-5}}$ % of the entire design space, and took 6.08 h.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call