Abstract

The computation on Graphics Processing Unit (GPU) has come out as a new cost-effective parallel computing paradigm for high performance computing that makes possible to process large scale data in parallel. GPU is designed to perform complex mathematical and geometric tasks which are primarily used for 3D graphics related functions. It is also possible to use GPU for non-graphics or general-purpose computation, called General Purpose Computing on GPU (GPGPU), a sub-discipline of High-Performance Computing (HPC). The use of GPU, along with CPU to accelerate more complex scientific, engineering and mathematical tasks is known as GPU Accelerated Computing. In this paper, we propose an efficient tensor computation for Hadamard Product (HP) which is directly applied in machine learning applications especially in Long Short-Term Memory (LSTM). The HP computation becomes complex when higher order tensors with millions of data is considered. Therefore, the only CPU-based traditional serial operation becomes tedious and inefficient. The contribution of this paper is in two fold; first we have developed efficient algorithms for higher order tensors by dimension conversion. Then we apply the algorithm in GPU to speed up the computation. To apply in GPU, we develop efficient partitioning scheme of higher order tensors. We have used CUDA (Compute Unified Device Architecture) C programming model developed by NVIDIA to implement the algorithm. We compared these algorithms with Traditional Multidimensional Array (TMA) based algorithm and found improved results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call