Neural Network Training Acceleration with PSO Algorithm on a GPU Using OpenCL

Jiajun Li,Qiang Liu

doi:10.1145/3120895.3120910

Abstract

Neural networks and deep learning currently provide the promising solutions to many practical problems. One of the difficulties in building neural network models is the training process that requires to find an optimal solution for the network weights. The Particle Swarm Optimization (PSO) algorithm has been recently applied to neural network training due to its global search. However, the PSO algorithm suffers from large execution time. In this paper, a parallel design of the PSO algorithm is proposed, using OpenCL language on a GPU. To improve the performance, fine memory allocation is considered for the parallel particle processing and an efficient parallel reduction scheme based on local and global reduction is proposed. By fully utilizing the processing power of the GPU, the OpenCL PSO implementation accelerates the neural network training by up to 35 times, compared to the multithreaded C++ implementation on a CPU.

Full Text