Stack-based parallel recursion on graphics processors

Ke Yang,Jiaoying Shi,Bingsheng He,Pedro V Sander,Qiong Luo

doi:10.1145/1594835.1504224

Abstract

Recent research has shown promising results on using graphics processing units (GPUs) to accelerate general-purpose computation. However, today's GPUs do not support recursive functions. As a result, for inherently recursive algorithms such as tree traversal, GPU programmers need to explicitly use stacks to emulate the recursion. Parallelizing such stack-based implementation on the GPU increases the programming difficulty; moreover, it is unclear how to improve the efficiency of such parallel implementations. As a first step to address both ease of programming and efficiency issues, we propose three parallel stack implementation alternatives that differ in the granularity of stack sharing. Taking tree traversals as an example, we study the performance tradeoffs between these alternatives and analyze their behaviors in various situations. Our results could be useful to both GPU programmers and GPU compiler writers.

Full Text