At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

Jens Domke,Lingqi Zhang,Yuetsu Kodama,Balazs Gerofi,Miquel Pericàs,Emil Vatai,Peng Chen,Artur Podobas,Mohamed Wahib,Satoshi Matsuoka,Aleksandr Drozd,Sparsh Mittal

doi:10.1145/3629520

Jens Domke, Lingqi Zhang + Show 10 more

https://doi.org/10.1145/3629520

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of a hypothetical LARge Cache processor (LARC), fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a broad set of proxy-applications and benchmarks, we aim to reveal how HPC CPU performance will evolve, and conclude an average boost of 9.56× for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design.

Full Text