Large scale recurrent neural network on GPU

Boxun Li,Jiayi Duan,Ningyi Xu,Bo Huang,Huazhong Yang,Yu Wang,Jiaxing Zhang,Erjin Zhou

doi:10.1109/ijcnn.2014.6889433

Abstract

Large scale artificial neural networks (ANNs) have been widely used in data processing applications. The recurrent neural network (RNN) is a special type of neural network equipped with additional recurrent connections. Such a unique architecture enables the recurrent neural network to remember the past processed information and makes it an expressive model for nonlinear sequence processing tasks. However, the large computation complexity makes it difficult to effectively train a recurrent neural network and therefore significantly limits the research on the recurrent neural network in the last 20 years. In recent years, the use of graphics processing units (GPUs) becomes a significant advance to speed up the training process of large scale neural networks by taking advantage of the massive parallelism capabilities of GPUs. In this paper, we propose an efficient GPU implementation of the large scale recurrent neural network and demonstrate the power of scaling up the recurrent neural network with GPUs. We first explore the potential parallelism of the recurrent neural network and propose a fine-grained two-stage pipeline implementation. Experiment results show that the proposed GPU implementation can achieve 2 ~ 11 x speed-up compared with the basic CPU implementation with the Intel Math Kernel Library. We then use the proposed GPU implementation to scale up the recurrent neural network and improve its performance. The experiment results of the Microsoft Research Sentence Completion Challenge demonstrate that the large scale recurrent network without class layer is able to beat the traditional class-based modest-size recurrent network and achieve an accuracy of 47%, the best result achieved by a single recurrent neural network on the same dataset.

Full Text