Abstract

Neural networks and deep learning currently provide the promising solutions to many practical problems. One of the difficulties in building neural network models is the training process that requires to find an optimal solution for the network weights. The Particle Swarm Optimization (PSO) algorithm has been recently applied to neural network training due to its global search. However, the PSO algorithm suffers from large execution time. In this paper, a parallel design of the PSO algorithm is proposed, using OpenCL language on a GPU. To improve the performance, fine memory allocation is considered for the parallel particle processing and an efficient parallel reduction scheme based on local and global reduction is proposed. By fully utilizing the processing power of the GPU, the OpenCL PSO implementation accelerates the neural network training by up to 35 times, compared to the multithreaded C++ implementation on a CPU.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call