Accelerating the Dynamic Programming for the Matrix Chain Product on the GPU

Kazufumi Nishida,Koji Nakano,Yasuaki Ito

doi:10.1109/icnc.2011.62

Abstract

Modern GPUs (Graphics Processing Units) can be used for general purpose parallel computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA (Compute Unified Device Architecture). The Matrix Chain Product Problem is an optimization problem for finding parentheses of the matrix chain that gives the minimum total number of multiplications necessary to compute the product of the matrix chain. It is well known that this problem can be solved using the dynamic programming technique in O(n 3 ) time using tables of size O(n 2 ). The main contribution of this paper is to present an efficient parallel implementation of this O(n 3 )-time algorithm on the GPU. In our implementation, we have considered the architecture and programming issues of the GPU system. The experimental results show that, for a chain of 16384 matrices generated at random, our implementation in the Nvidia GeForce GTX 480 achieves a speedup factor of 40 over a conventional CPU implementation.

Full Text