Exploiting selective instruction reuse and value prediction in a superscalar architecture

Arpad Gellert,Lucian Vintan,Adrian Florea

doi:10.1016/j.sysarc.2008.11.002

Abstract

In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches represent a source of important performance penalties. Our statistics show that about 28% of branches are dependent on critical Load instructions. Moreover, 5.61% of branches are unbiased and depend on critical Loads, too. In the same way, about 21% of branches depend on MUL/DIV instructions whereas 3.76% are unbiased and depend on MUL/DIV instructions. These dependences involve high-penalty mispredictions becoming serious performance obstacles and causing significant performance degradation in executing instructions from wrong paths. Therefore, the negative impact of (unbiased) branches over global performance should be seriously attenuated by anticipating the results of long-latency instructions, including critical Loads. On the other hand, hiding instructions' long latencies in a pipelined superscalar processor represents an important challenge itself. We developed a superscalar architecture that selectively anticipates the values produced by high-latency instructions. In this work we are focusing on multiply, division and loads with miss in L1 data cache, implementing a dynamic instruction reuse scheme for the MUL/DIV instructions and a simple last value predictor for the critical Load instructions. Our improved superscalar architecture achieves an average IPC speedup of 3.5% on the integer SPEC 2000 benchmarks, of 23.6% on the floating-point benchmarks, and an improvement in energy-delay product (EDP) of 6.2% and 34.5%, respectively. We also quantified the impact of our developed selective instruction reuse and value prediction techniques in a simultaneous multithreaded architecture (SMT) that implies per thread reuse buffers and load value prediction tables. Our simulation results showed that the best improvements on the SPEC integer applications have been obtained with 2 threads: an IPC speedup of 5.95% and an EDP gain of 10.44%. Although, on the SPEC floating-point programs, we obtained the highest improvements with the enhanced superscalar architecture, the SMT with 3 threads also provides an important IPC speedup of 16.51% and an EDP gain of 25.94%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting selective instruction reuse and value prediction in a superscalar architecture

Abstract

Talk to us

Similar Papers

More From: Journal of Systems Architecture

Lead the way for us

Journal: Journal of Systems Architecture	Publication Date: Dec 3, 2008
Citations: 21

Similar Papers

1076936 PB84-873140 Machine automation and numerical control. July, 1980–January, 1983 (Citations from NTIS data base): National Technical Information Service, Springfield, VA
-
Robotics and Computer Integrated Manufacturing | VOL. 2
--
01 Jan 1985
Robotics and Computer Integrated Manufacturing | VOL. 2

Energy-Efficient Reconfigurable Computing Using a Circuit-Architecture-Software Co-Design Approach
Somnath Paul ... Swarup Bhunia
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 1
Somnath Paul, et. al.Somnath Paul ... Swarup Bhunia
01 Sep 2011
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 1

Improving SMT performance scheduling processes
R Goncalves ... P Navaux
-
R Goncalves, et. al.R Goncalves ... P Navaux
09 Jan 2002
09 Jan 2002

Branch Prediction Topologies for SMT Architectures
G Dal Pizzol ... P.O.A Navaux
-
G Dal Pizzol, et. al.G Dal Pizzol ... P.O.A Navaux
24 Oct 2005
24 Oct 2005

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting selective instruction reuse and value prediction in a superscalar architecture

Abstract

Talk to us

Similar Papers

More From: Journal of Systems Architecture