Abstract

On-chip multiprocessor can be an alternative to the wide-issue superscalar processor approach which is currently the mainstream to exploit the increasing number of transistors on a silicon chip. Utilization of the cache, especially for the remote data is important in the system using such on-chip multiprocessors since the ratio of the off-chip and the on-chip memory access latencies is higher than traditional board-level implementation of the cache coherent non-uniform memory access (CC-NUMA) multiprocessors. We examine two options to utilize the cache resource of the on-chip multiprocessors whose size is restrained by the die area: (1) the instruction and/or private data are only cached at the L1 cache to leave more space on the L2 cache for the shared data; (2) divide cache area into the L2 and the remote victim caches or use all the area for the L2 cache. Results of execution-driven simulations show that the first option improved the performance up to 15%. For the second option, a remote victim cache with 1/8 of the L2 cache size improved three out of four benchmark programs by 4–8%. However, the combination of L2 and victim caches that divide the cache area into two halves of the same size was outperformed by the L2 cache occupying the entire cache area in three out of four benchmark programs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call