The Entangling Instruction Prefetcher

Alberto Ros,Alexandra Jimborean

doi:10.1109/lca.2020.3002947

Abstract

Prefetching instructions is a fundamental technique for designing high-performance computers. There are three key properties to consider when designing an efficient and effective prefetcher: timeliness, coverage, and accuracy. Timeliness is an essential property, as bringing instructions too early increases the risk of the instructions being evicted from the cache before their use while requesting them too late can lead to the instructions arriving past their designated execution time. Coverage is important to reduce the number of instruction cache misses (there is enough prefetching), and accuracy to ensure that the prefetcher does not pollute the cache or interacts negatively with the other hardware mechanisms (there is not too much prefetching). This letter presents the Entangling instruction prefetcher that entangles instructions to provide timeliness. The prefetcher works by finding which instruction should trigger the prefetch for a subsequent instruction, accounting for the latency of each cache miss. The prefetcher is carefully adjusted to account for both coverage and accuracy. Our evaluation shows that the Entangling I-prefetcher increases performance by 29.3 percent on average, with a coverage of 94.9 percent and accuracy of 77.4 percent.

Highlights

INSTRUCTION fetch stalls block the processor pipeline, causing significant performance degradation
Applications with large working instruction sets that do not fit in the first level cache, such as server applications or applications designed to run in the Cloud, exhibit large instruction-cache miss rates and incur more stalls
The Return-address stack-directed instruction prefetching (RDIP) [3] captures the context of a miss caused by a function call as signatures which are consulted upon each call and return operations to trigger prefetching

Summary

Introduction

INSTRUCTION fetch stalls block the processor pipeline, causing significant performance degradation. As memory latency has been recognized as a critical factor for performance, prefetching techniques have emerged to install the data or instructions in the cache ahead of time, ready to be used when demanded by the processor [1]. Driven by their impact on performance, prefetchers have evolved from simple line prefetchers, to complex techniques, such as the Proactive Instruction Fetch prefetcher [2] captures the blocks accessed by the committed instructions and instructions from handlers for OS interrupts. The Entangling I-prefetcher is robust and effective, agnostic to the application characteristics and achieves a 99.3 percent I-hit rate, approaching the perfect L1-I

Methods

Results

Conclusion