Efficient convolution pooling on the GPU

Shunsuke Suita,Takahiro Nishimura,Hiroki Tokura,Koji Nakano,Yasuaki Ito,Akihiko Kasagi,Tsuguchika Tabaru

doi:10.1016/j.jpdc.2019.12.006

Shunsuke Suita, Takahiro Nishimura + Show 5 more

Open Access

https://doi.org/10.1016/j.jpdc.2019.12.006

Copy DOI

Abstract

The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate the convolution-pooling. Our new GPU implementation uses two techniques, (1) convolution interchange with direct sum, and (2) conversion to matrix multiplication. By these techniques, the computational and memory access cost are reduced. Further the convolution interchange is converted to matrix multiplication, which can be computed by cuBLAS very efficiently. Experimental results using Tesla V100 GPU show that our new GPU implementation compatible with cuDNN for the convolution-pooling is expected 2.90 times and 1.43 times faster for fp32 and fp16 than the multiple convolution and then the pooling by cuDNN, respectively. the most popular library of primitives to implement the CNNs in the GPU.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Parallel and Distributed Computing	Publication Date: Jan 7, 2020
Citations: 12	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Efficient convolution pooling on the GPU

Abstract

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing

Lead the way for us

Similar Papers

Efficient cuDNN-Compatible Convolution-Pooling on the GPU
Shunsuke Suita ... Akihiko Kasagi
-
Shunsuke Suita, et. al.Shunsuke Suita ... Akihiko Kasagi
01 Jan 2020
01 Jan 2020

Deep Cost Adaptive Convolutional Network: A Classification Method for Imbalanced Mechanical Data
Xun Dong ... Kesi Li
IEEE Access | VOL. 8
Xun Dong, et. al.Xun Dong ... Kesi Li
01 Jan 2020
IEEE Access | VOL. 8

Fault Diagnosis Method for Aircraft EHA Based on FCNN and MSPSO Hyperparameter Optimization
Xudong Li ... Yuyuan Cao
Applied Sciences | VOL. 12
Xudong Li, et. al.Xudong Li ... Yuyuan Cao
26 Aug 2022
Applied Sciences | VOL. 12

Computation and memory optimized spectral domain convolutional neural network for throughput and energy-efficient inference.
Shahriyar Masud Rizvi ... Usman Ullah Sheikh
Applied Intelligence | VOL. 53
Shahriyar Masud Rizvi, et. al.Shahriyar Masud Rizvi ... Usman Ullah Sheikh
11 Jun 2022
Applied Intelligence | VOL. 53

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient convolution pooling on the GPU

Abstract

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing