Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU Implementation

Koji Nakano,Yasuaki Ito

doi:10.1109/pdp.2015.46

Abstract

The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of CUDA-enabled GPU architecture. It has multiple streaming multiprocessors with a shared memory, and the global memory that can be accessed by all threads. The HMM has several parameters: the number d of streaming multiprocessors, the number p of threads per streaming multiprocessor, the number w of memory banks of each shared memory and the global memory, shared memory latency l, and global memory latency L. The main purpose of this paper is to discuss optimality of fundamental parallel algorithms running on the HMM. We first show that image convolution for an image with n × n pixels using a filter of size (2v+1) × (2v+1) can be done in O(n 2 /w+n 2 L/dp+n 2 v 2 /dw+n 2 v 2 l/dp) time units on the HMM. Further, we show that this parallel implementation is time optimal by proving the lower bound of the running time. We then go on to show that the product of two n × n matrices can be computed in O(n 3 /mw+n 3 L/mdp+n 3 /dw+n 3 l/dp) time units on the HMM if the capacity of the shared memory in each streaming multiprocessor is O(m 2 ). This implementation is also proved to be time optimal. We further clarify the conditions for image convolution and matrix multiplication to hide the memory access latency overhead and to maximize the global memory throughput and the parallelism. Finally, we provide experimental results on GeForce GTX Titan to support our theoretical analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU Implementation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine
Koji Nakano
-
Koji NakanoKoji Nakano
01 Dec 2014
01 Dec 2014

The Hierarchical Memory Machine Model for GPUs
Koji Nakano
-
Koji NakanoKoji Nakano
01 May 2013
01 May 2013

Adapting to Hostile Architectural Environments

Scalable Computing Practice and Experience | VOL. 2

01 Jan 1998
Scalable Computing Practice and Experience | VOL. 2

The Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation
Duhu Man ... Koji Nakano
-
Duhu Man, et. al.Duhu Man ... Koji Nakano
01 Sep 2013
01 Sep 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU Implementation

Abstract

Talk to us

Similar Papers