Prefetch Efficiency Research Articles

Data prefetching, i.e., the act of predicting an application’s future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely used approach to hide the long latency of memory accesses. The fruitfulness of data prefetching is evident to both industry and academy: Nowadays, almost every high-performance processor incorporates a few data prefetchers for capturing various access patterns of applications; besides, there is a myriad of proposals for data prefetching in the research literature, where each proposal enhances the efficiency of prefetching in a specific way. In this survey, we evaluate the effectiveness of data prefetching in the context of server applications and shed light on its design trade-offs. To do so, we choose a target architecture based on a contemporary server processor and stack various state-of-the-art data prefetchers on top of it. We analyze the prefetchers in terms of their ability to predict memory accesses and enhance overall system performance, as well as their imposed overheads. Finally, by comparing the state-of-the-art prefetchers with impractical ideal prefetchers, we motivate further work on improving data prefetching techniques.

Prefetching methods for instruction caches are studied via trace-driven simulation. The two primary methods are "fall-through" prefetch (sometimes referred to as "one block lookahead") and "target" prefetch. Fall-through prefetches are for sequential line accesses, and a key parameter is the distance from the end of the current line where the prefetch for the next line is initiated. Target prefetches work also for nonsequential line accesses. A prediction table is used and a key aspect is the prediction algorithm implemented by the table. Fall-through prefetch and target prefetch each improve performance significantly. When combined in a hybrid algorithm, their performance improvement is nearly additive. An instruction cache using a combined target and fall-through method can provide the same performance as a two to four times larger cache that does not prefetch. A good prediction method must not only be accurate, but prefetches must be initiated early enough to allow time for the instructions to return from main memory. To quantify this, we define a "prefetch efficiency" measure that reflects the amount of memory fetch delay that may be successfully hidden by prefetching. The better prefetch methods (in terms of miss rate) also have very high efficiencies, hiding approximately 90 percent of the miss delay for prefetched lines. Another performance measure of interest is memory traffic. Without prefetching, large line sizes give better hit rates; with prefetching, small line sizes tend to give better overall hit rates. Because smaller line sizes tend to reduce memory traffic, the top-performing prefetch caches produce less memory traffic than the top-performing nonprefetch caches of the same size.

Prefetch Efficiency Research Articles

Related Topics

Articles published on Prefetch Efficiency

Evaluation of Hardware Data Prefetchers on Server Processors

A performance study of instruction cache prefetching methods

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Prefetch Efficiency Research Articles

Related Topics

Articles published on Prefetch Efficiency

Evaluation of Hardware Data Prefetchers on Server Processors

A performance study of instruction cache prefetching methods