Adaptive Granularity Based Last-Level Cache Prefetching Method with eDRAM Prefetch Buffer for Graph Processing Applications

Sae-Gyeol Choi,Jeong-Geun Kim,Shin-Dug Kim

doi:10.3390/app11030991

Abstract

The emergence of big data processing and machine learning has triggered the exponential growth of the working set sizes of applications. In addition, several modern applications are memory intensive with irregular memory access patterns. Therefore, we propose the concept of adaptive granularities to develop a prefetching methodology for analyzing memory access patterns based on a wider granularity concept that entails both cache lines and page granularity. The proposed prefetching module resides in the last-level cache (LLC) to handle the large working set of memory-intensive workloads. Additionally, to support memory access streams with variable intervals, we introduced an embedded-DRAM based LLC prefetch buffer that consists of three granularity-based prefetch engines and an access history table. By adaptively changing the granularity window for analyzing memory streams, the proposed model can swiftly and appropriately determine the stride of memory addresses to generate hidden delta chains from irregular memory access patterns. The proposed model achieves 18% and 15% improvements in terms of energy consumption and execution time compared to global history buffer and best offset prefetchers, respectively. In addition, our model reduced the total execution time and energy consumption by approximately 6% and 2.3%, compared to those of the Markov prefetcher and variable-length delta prefetcher.

Highlights

The critical performance bottleneck caused by main memory latency has been a major limitation to modern computing architectures
To handle these kinds of memory streams, we extended the granularities for memory access analysis; a set of several cache lines, called the correlated line, which has a wider window of locality to find hidden memory access patterns
We describe the analysis of the prefetching performance of our proposed model based on the adaptive granularity method with an embedded DRAM (eDRAM)-based prefetch buffer

Summary

Introduction

The critical performance bottleneck caused by main memory latency has been a major limitation to modern computing architectures. We cannot exactly predict the memory access pattern, a prefetching method is key to reducing the average memory latency. To accurately exploit memory access patterns, previous studies have suggested several methods for analyzing regular or irregular patterns using memory address history-based heuristic algorithms [1,2,3,4,5,6,7,8,9,10,11,12]. We need to analyze memory access patterns in more detail to generate accurate prefetching candidates. To find stride patterns from various memory access streams, this method records recent deltas or addresses [6] (a difference of an address value between two consecutive demand memory requests).

Methods

Results

Conclusion