AVPP

Lois Orosa,Onur Mutlu,Rodolfo Azevedo

doi:10.1145/3239567

Abstract

Value prediction improves instruction level parallelism in superscalar processors by breaking true data dependencies. Although this technique can significantly improve overall performance, most of the state-of-the-art value prediction approaches require high hardware cost, which is the main obstacle for its wide adoption in current processors. To tackle this issue, we revisit load value prediction as an efficient alternative to the classical approaches that predict all instructions. By speculating only on loads, the pressure over shared resources (e.g., the Physical Register File) and the predictor size can be substantially reduced (e.g., more than 90% reduction compared to recent works). We observe that existing value predictors cannot achieve very high performance when speculating only on load instructions. To solve this problem, we propose a new, accurate and low-cost mechanism for predicting the values of load instructions: the Address-first Value-next Predictor with Value Prefetching (AVPP). The key idea of our predictor is to predict the load address first (which, we find, is much more predictable than the value) and to use a small non-speculative Value Table (VT)—indexed by the predicted address—to predict the value next. To increase the coverage of AVPP, we aim to increase the hit rate of the VT by predicting also the load address of a future instance of the same load instruction and prefetching its value in the VT. We show that AVPP is relatively easy to implement, requiring only 2.5% of the area of a 32KB L1 data cache. We compare our mechanism with five state-of-the-art value prediction techniques, evaluated within the context of load value prediction, in a relatively narrow out-of-order processor. On average, our AVPP predictor achieves 11.2% speedup and 3.7% of energy savings over the baseline processor, outperforming all the state-of-the-art predictors in 16 of the 23 benchmarks we evaluate. We evaluate AVPP implemented together with different prefetching techniques, showing additive performance gains (20% average speedup). In addition, we propose a new taxonomy to classify different value predictor policies regarding predictor update, predictor availability, and in-flight pending updates. We evaluate these policies in detail.

Full Text