Analysis of classic algorithms on GPUs

Lin Ma,Kunal Agrawal,Roger D Chamberlain

doi:10.1109/hpcsim.2014.6903670

Abstract

The recently developed Threaded Many-core Memory (TMM) model provides a framework for analyzing algorithms for highly-threaded many-core machines such as GPUs. In particular, it tries to capture the fact that these machines hide memory latencies via the use of a large number of threads and large memory bandwidth. The TMM model analysis contains two components: computational complexity and memory complexity. A model is only useful if it can explain and predict empirical data. In this work, we investigate the effectiveness of the TMM model. We analyze algorithms for 5 classic problems - suffix tree/array for string matching, fast Fourier transform, merge sort, list ranking, and all-pairs shortest paths-under this model, and compare the results of the analysis with the experimental findings of ours and other researchers who have implemented and measured the performance of these algorithms on an spectrum of diverse GPUs. We find that the TMM model is able to predict important and sometimes previously unexplained trends and artifacts in the experimental data.

Full Text