Reducing reorder buffer complexity through selective operand caching

Gurhan Kucuk,Kanad Ghose,Oguz Ergin,Dmitry Ponomarev

doi:10.1145/871506.871564

Abstract

Modern superscalar processors implement precise interrupts by using the Reorder Buffer (ROB). In some microarchitectures , such as the Intel P6, the ROB also serves as a repository for the uncommitted results. In these designs, the ROB is a complex multi-ported structure that dissipates a significant percentage of the overall chip power. Recently, a mechanism was introduced for reducing the ROB complexity and its power dissipation through the complete elimination of read ports for reading out source operands. The resulting performance degradation is countered by caching the most recently produced results in a small set of associatively-addressed latches (retention latches). We propose an enhancement to the above technique by leveraging the notion of short-lived operands (values targeting the registers that are renamed by the time the instruction producing the value reaches the writeback stage). As much as 87% of all generated values are short lived for the SPEC 2000 benchmarks. Significant improvements in the utilization of retention latches, the overall performance, complexity and power are achieved by not caching short-lived values in the retention latches. As few as two retention latches allow all source operand read ports on the ROB to be completely eliminated with very little impact on performance.

Full Text