SuperCache: A Mechanism to Minimize the Front End Latency

Allan Zhang,Sumi Helal

doi:10.1109/itng.2007.189

Abstract

Modern CPU's pipeline stages can be roughly classified as front end and back end stages. Front end supplies ready (decoded, renamed) instructions and dispatches them to reservation stations where back end issues, executes and retires them. The lengthy front end stages, including instruction fetching, decoding, renaming and dispatching, play a key role in overall performance: only adequate ready instruction supply can make room for back end stages to fully exploit instruction level parallelism (ILP). The front end latency reduction is especially critical for recent deeply pipelined architecture where the front end is especially long: instruction cache access may take more than one cycle even for cache hit, let alone cache miss. In case of branch mis-prediction, the supply/demand equilibrium between front end and back end is suddenly disrupted, back end often under-utilizes available resources during the long waiting period until front end can supply new branch of instructions ready in reservation stations. In this paper, we introduce and evaluate a new mechanism (called SuperCache) that aims to reduce the front end latency by enhancing the traditional reservation pool to a SuperCache and recycle retired reservation stations. With the employment of the proposed mechanism, we can see a significant performance improvement by up to 15% even 30% in our simulations

Full Text