Advances in Microprocessor Cache Architectures Over the Last 25 Years

Ravi Iyer,Bhushan Chitlur,Eric Karl,David Koufaty,Vivek De,Fatih Hamzaoglu,Andrew Herdrich,Muhammad Khellah,Ramesh Illikkal

doi:10.1109/mm.2021.3114903

Abstract

Over the last 25 years, the use of caches has advanced significantly in mainstream microprocessors to address the memory wall challenge. As we transformed microprocessors from single-core to multicore to manycore, innovations in the architecture, design, and management of on-die cache hierarchy were critical to enabling scaling in performance and efficiency. In addition, at the system level, as input/output (I/O) devices (e.g., networking) and accelerators (domain-specific) started to interact with general-purpose cores across shared memory, advancements in caching became important as a way of minimizing data movement and enabling faster communication. In this article, we cover some of the major advancements in cache research and development that have improved the performance and efficiency of microprocessor servers over the last 25 years. We will reflect upon several techniques including shared and distributed last-level caches (including data placement and coherence), cache Quality of Service (addressing interference between workloads), direct cache access (placing I/O data directly into CPU caches), and extending caching to off-die accelerators (CXL.cache). We will also outline potential future directions for cache research and development over the next 25 years.

Full Text