GPU Maps for the Space of Computation in Triangular Domain Problems

Cristobal A Navarro,Nancy Hitschfeld

doi:10.1109/hpcc.2014.64

Abstract

There is a stage in the GPU computing pipeline where a grid of thread-blocks, or space of computation, is mapped to the problem domain. Normally, the space of computation is a k-dimensional bounding box (BB) that covers a k-dimensional problem. Threads that fall inside the problem domain perform computations and threads that fall outside are discarded, all happening at runtime. For problems with non-square geometry, this approach makes the space of computation larger than what is necessary, wasting many threads. Our case of interest are the two-dimensional triangular domain problems, alias td-problems, where almost half of the space of computation is unnecessary when using the BB approach. Problems such as the Euclidean distance map or collision detection are td-problems and they appear frequently as part of a larger computational problem. In this work, we study several mapping functions and their contribution to a better space of computation by reducing the number of unnecessary threads. We compare the performance of four existing mapping strategies, the bounding box (BB), the upper-triangular mapping (UTM), the rectangular box (RB) and the recursive partition (REC). In addition, we propose a map g(a#x03BB;), that maps any a#x03BB; block to a unique location (i, j) in the triangular domain. The mapping is based on the properties of the lower triangular matrix and works in block space. The theoretical improvement I obtained from using g(a#x03BB;) is upper bounded as I < 2 and the number of unnecessary blocks is reduced from O(n2) to O(n). Experimental results using different Nvidia Kepler GPUs show that for computing the Euclidean distance matrix g(a#x03BB;) achieves an improvement of up to 18% over the basic bounding box (BB) strategy, runs faster than UTM and REC strategies and it is almost as fast as RB. Performance results on shared memory 3D collision detection show that g(a#x03BB;) is the fastest map of all, and the only one capable of surpassing the brute force (BB) approach by a margin of up to 7%. These results help us realize that one of the main advantages of g(a#x03BB;) is the fact that it uses block space mapping, where coordinate values are small in magnitude and thread organization is not compromised, making the map stable in performance under different memory access patterns.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GPU Maps for the Space of Computation in Triangular Domain Problems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An improved optimal algorithm for collision detection of hybrid hierarchical bounding box
Baiqiang Gan ... Qiuping Dong
Evolutionary Intelligence | VOL. 15
Baiqiang Gan, et. al.Baiqiang Gan ... Qiuping Dong
01 Feb 2021
Evolutionary Intelligence | VOL. 15

Automatic clustering method for real-time construction simulation
Wei-Han Hung ... Shih-Chung Jessy Kang
Advanced Engineering Informatics | VOL. 28
Wei-Han Hung, et. al.Wei-Han Hung ... Shih-Chung Jessy Kang
26 Mar 2014
Advanced Engineering Informatics | VOL. 28

Hybrid distance field computation
...
-
, et. al. ...
21 Jun 2001
21 Jun 2001

Simple and parallel proximity algorithms for general polygonal models
...
Computer Animation and Virtual Worlds | VOL. 21
, et. al. ...
01 May 2010
Computer Animation and Virtual Worlds | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GPU Maps for the Space of Computation in Triangular Domain Problems

Abstract

Talk to us

Similar Papers