Enabling Portable Optimizations of Data Placement on GPU

Guoyang Chen,Bo Wu,Xipeng Shen,Dong Li

doi:10.1109/mm.2015.53

Abstract

Modern GPU memory systems manifest more varieties, increasing complexities, and rapid changes. Different placements of data on memory systems often cause significant differences in program performance. Most current GPU programming systems rely on programmers to indicate the appropriate placements, but finding the appropriate placements is difficult for programmers in practice owing to the complexity and fast changes of memory systems, as well as the input sensitivity of appropriate data placements--that is, the best placements often differ when a program runs on a different input data set. This article introduces a software framework called Porple. It offers a solution that, for the first time, makes it possible to automatically enhance data placement across a GPU. Through Porple, a GPU program's data gets placed appropriately on memory on the fly, customized to the current input dataset. Moreover, when new memory systems arrive, it can easily adapt the placements accordingly. Experiments on three types of GPU systems show that Porple consistently finds optimal or near-optimal placement, yielding up to 2.93 times (1.75 times average on three generations of GPU) speedups compared to programmers' decisions.

Full Text