Simple memory machine models for GPUs

Koji Nakano

doi:10.1080/17445760.2012.731507

Abstract

The main contribution of this paper is to introduce two parallel memory machines, the discrete memory machine (DMM) and the unified memory machine (UMM). Unlike well-studied theoretical parallel computational models such as parallel random access machines, these parallel memory machines are practical and capture the essential feature of the memory access by graphical processing units (GPUs). As a first step of the development of algorithmic techniques on the DMM and the UMM, we first evaluate the computing time for the contiguous access and the stride access to the memory on these models. We then present parallel algorithms to transpose a 2D array on these models and evaluate their performance. Finally, we show that, for any permutation given in offline, data in an array can be moved efficiently along the given permutation both on the DMM and on the UMM. Since the computing time of our permutation algorithms on the DMM and the UMM is equal to the sum of the lower bounds obtained from the memory bandwidth limitation and the latency limitation, they are optimal from the theoretical point of view. We believe that the DMM and the UMM can be good theoretical platforms to develop algorithmic techniques for GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Simple memory machine models for GPUs

Abstract

Talk to us

Similar Papers

More From: International Journal of Parallel, Emergent and Distributed Systems

Lead the way for us

Journal: International Journal of Parallel, Emergent and Distributed Systems	Publication Date: Nov 27, 2012
Citations: 19

Similar Papers

Simple Memory Machine Models for GPUs
Koji Nakano
-
Koji NakanoKoji Nakano
01 May 2012
01 May 2012

Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models
Koji Nakano
IEICE Transactions on Information and Systems | VOL. E96.D
Koji NakanoKoji Nakano
01 Jan 2013
IEICE Transactions on Information and Systems | VOL. E96.D

Optimal implementations of the approximate string matching and the approximate discrete signal matching on the memory machine models
Koji Nakano
International Journal of Parallel, Emergent and Distributed Systems | VOL. 29
Koji NakanoKoji Nakano
16 Apr 2013
International Journal of Parallel, Emergent and Distributed Systems | VOL. 29

Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU
Akihiko Kasagi ... Koji Nakano
IEICE Transactions on Information and Systems | VOL. E96.D
Akihiko Kasagi, et. al.Akihiko Kasagi ... Koji Nakano
01 Jan 2013
IEICE Transactions on Information and Systems | VOL. E96.D

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simple memory machine models for GPUs

Abstract

Talk to us

Similar Papers

More From: International Journal of Parallel, Emergent and Distributed Systems