Fetch Architecture Research Articles

In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the execution core. Attaining these targets is a challenging task due to I-cache misses, branch mispredictions, and taken branches in the instruction stream. To counter these challenges, we present a fetch architecture that decouples the branch predictor from the instruction fetch unit. A Fetch Target Queue (FTQ) is inserted between the branch predictor and instruction cache. This allows the branch predictor to run far in advance of the address currently being fetched by the cache. The decoupling enables a number of architecture optimizations, including multilevel branch predictor design, fetch-directed instruction prefetching, and easier pipelining of the instruction cache. For the multilevel predictor, we show that it performs better than a single-level predictor, even when ignoring the effects of cycle-timing issues. We also examine the performance of fetch-directed instruction prefetching using a multilevel branch predictor and show that an average 19 percent speedup is achieved. In addition, we examine pipelining the instruction cache to achieve a faster cycle time for the processor pipeline and show that pipelining provides an average 27 percent speedup over not pipelining the instruction cache for the programs examined.

Read full abstract

The design of higher performance processors has been following two major trends: increasing the pipeline depth to allow faster clock rates, and widening the pipeline to allow parallel execution of more instructions. Designing a higher performance processor implies balancing all the pipeline stages to ensure that overall performance is not dominated by any of them. This means that a faster execution engine also requires a faster fetch engine, to ensure that it is possible to read and decode enough instructions to keep the pipeline full and the functional units busy. This paper explores the challenges faced by the instruction fetch stage for a variety of processor designs, from early pipelined processors, to the more aggressive wide issue superscalars. We describe the different fetch engines proposed in the literature, the performance issues involved, and some of the proposed improvements. We also show how compiler techniques that optimize the layout of the code in memory can be used to improve the fetch performance of the different engines described Overall, we show how instruction fetch has evolved from fetching one instruction every few cycles, to fetching one instruction per cycle, to fetching a full basic block per cycle, to several basic blocks per cycle: the evolution of the mechanism surrounding the instruction cache, and the different compiler optimizations used to better employ these mechanisms.

Read full abstract

Fetch Architecture Research Articles

Related Topics

Articles published on Fetch Architecture

DIA: A Complexity-Effective Decoding Architecture

Enlarging Instruction Streams

A low-complexity fetch architecture for high-performance superscalar processors

Using a serial cache for energy efficient instruction fetching

Optimizations enabled by a decoupled front-end architecture

Instruction fetch architectures and code layout optimizations

A scalable front-end architecture for fast instruction delivery

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Fetch Architecture Research Articles

Related Topics

Articles published on Fetch Architecture

DIA: A Complexity-Effective Decoding Architecture

Enlarging Instruction Streams

A low-complexity fetch architecture for high-performance superscalar processors

Using a serial cache for energy efficient instruction fetching

Optimizations enabled by a decoupled front-end architecture

Instruction fetch architectures and code layout optimizations

A scalable front-end architecture for fast instruction delivery