Abstract

The recently developed Threaded Many-core Memory (TMM) model provides a framework for analyzing algorithms for highly-threaded many-core machines such as GPUs and Cray supercomputers. In particular, it tries to capture the fact that these machines hide memory latencies via the use of a large number of threads and large memory bandwidth. The TMM model analysis contains two components: computational and memory complexity.A model is only useful if it can explain and predict empirical data. In this work, we investigate the effectiveness of the TMM model. Under this model, we analyze algorithms for 5 classic problems— suffix tree/array for string matching, fast Fourier transform, merge sort, list ranking, and all-pairs shortest paths—on a variety of GPUs. We also analyze memory access, matrix multiply and a sequence alignment algorithm on a set of Cray XMT supercomputers, the latest NVIDIA and AMD GPUs. We compare the results of the analysis with the experimental findings of ours and other researchers who have implemented and measured the performance of these algorithms on a spectrum of diverse GPUs and Cray appliances. We find that the TMM model is able to predict important, non-trivial, and sometimes previously unexplained trends and artifacts in the experimental data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.