Abstract

Modern CPU's pipeline stages can be roughly classified as front end and back end stages. Front end supplies ready (decoded, renamed) instructions and dispatches them to reservation stations where back end issues, executes and retires them. The lengthy front end stages, including instruction fetching, decoding, renaming and dispatching, play a key role in overall performance: only adequate ready instruction supply can make room for back end stages to fully exploit instruction level parallelism (ILP). The front end latency reduction is especially critical for recent deeply pipelined architecture where the front end is especially long: instruction cache access may take more than one cycle even for cache hit, let alone cache miss. In case of branch mis-prediction, the supply/demand equilibrium between front end and back end is suddenly disrupted, back end often under-utilizes available resources during the long waiting period until front end can supply new branch of instructions ready in reservation stations. In this paper, we introduce and evaluate a new mechanism (called SuperCache) that aims to reduce the front end latency by enhancing the traditional reservation pool to a SuperCache and recycle retired reservation stations. With the employment of the proposed mechanism, we can see a significant performance improvement by up to 15% even 30% in our simulations

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.