Two-Step Physical Register Deallocation for Data Prefetching and Address Pre-Calculation

Akihiro Yamamoto,Toshio Shimada,Yusuke Tanaka,Hideki Ando

doi:10.2197/ipsjtrans.1.94

Akihiro Yamamoto, Toshio Shimada + Show 2 more

Open Access

https://doi.org/10.2197/ipsjtrans.1.94

Copy DOI

Abstract

This paper proposes an instruction pre-execution scheme for a high performance processor, that reduces latency and early scheduling of loads. Our scheme exploits the difference between the amount of instruction-level parallelism available with an unlimited number of physical registers and that available with an actual number of physical registers. We introduce the two-step physical register deallocation scheme, which deallocates physical registers at the renaming stage as a first step, and eliminates pipeline stalls caused by a shortage of physical registers. Instructions wait for the final deallocation as a second step in the instruction window. While waiting, the scheme allows pre-execution of instructions, that enables prefetching of load data and early calculation of memory effective addresses. Our evaluation results show that our scheme improves the performance significantly, and achieves a 1.26 times speedup over a processor without a prefetcher. If combined with a stride prefetcher, it achieves a 1.18 times speedup over a processor with a stride prefetcher.

Full Text