DTM@GPU: Characterizing and evaluating trace redundancy in GPU

Leandro A J Marzulo,Igor M Coelho,Alexandre C Sena,Saulo T Oliveira,Cristiana Bentes,Tiago A O Alves,Felipe M G França,Maria Clicia S De Castro,Alexandre S Nery

doi:10.1002/cpe.4450

Abstract

SummaryIn a program, there is usually a significant amount of instructions that are repeatedly executed with the same inputs during the execution. This redundancy allows the reuse of previous computations, potentially reducing the program execution time. The Dynamic Trace Memoization technique (DTM) was proposed to exploit the reuse of a dynamic sequence of redundant instructions for superscalar CPUs. This paper proposes the application of the DTM technique on a GPU architecture. We propose the DTM@GPU model that adapts the original DTM technique to the NVIDIA GPU architecture by introducing architectural modifications and the identification of different trace reuse styles in multithreaded environments. We investigate reuse opportunities in real‐world GPU applications and the potential performance gains. We also perform a detailed investigation on the characteristics of the reused traces. This characterization shows the number and size of the reused traces, the influence of the cache size on reuse rates, and the cycles that are saved when all threads in a warp reuse instructions or traces. The results show approximately up to 35.3% of reuse, yielding an estimated speedup gain of 10.7%.

Full Text