A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior

Joel Hestness,Stephen W Keckler,David A Wood

doi:10.1109/iiswc.2014.6983054

Abstract

While heterogeneous CPU/GPU systems have been traditionally implemented on separate chips, each with their own private DRAM, heterogeneous processors are integrating these different core types on the same die with access to a common physical memory. Further, emerging heterogeneous CPU-GPU processors promise to offer tighter coupling between core types via a unified virtual address space and cache coherence. To adequately address the potential opportunities and pitfalls that may arise from this tighter coupling, it is important to have a deep understanding of application- and memory-level demands from both CPU and GPU cores. This paper presents a detailed comparison of memory access behavior for parallel applications executing on each core type in tightly-controlled heterogeneous CPU-GPU processor simulation. This characterization indicates that applications are typically designed with similar algorithmic structures for CPU and GPU cores, and each core type's memory access path has a similar locality filtering role. However, the different core and cache microarchitectures expose substantially different memory-level parallelism (MLP), which results in different instantaneous memory access rates and sensitivity to memory hierarchy architecture.

Full Text