It is difficult to train deep recurrent neural networks (RNNs) to learn the complex dependencies in sequential data, because of gradient problems and learning conflicts. To address these learning problems, this article proposes a partially recurrent network (PR-Net), which consists of the partially recurrent (PR) layers with highway connections. The proposed PR layer explicitly arranges the neurons into a memory module and an output module, which are implemented by highway networks. The memory module places memory on the information highway, to maintain long short-term memory (LSTM). The output module places external input on the information highway, to approximate the complex transition function from one step to the next. Because of the highway connections in the output module, the PR layer can pass gradient information across layers easily, to allow the training of the PR-Net with multiple stacked PR layers. Furthermore, the explicit separation of the memory module and output module can reduce the learning burden of RNNs that which neurons are responsible for memory and which are for output. With a comparable number of parameters, sublayers in the recurrent layer, and stacked recurrent layers, the proposed PR-Net outperformed the recurrent highway network and LSTM on three sequence learning benchmarks, significantly.
Read full abstract