Abstract

Reservoir computing (RC) offers efficient temporal data processing with a low training cost by separating recurrent neural networks into a fixed network with recurrent connections and a trainable linear network. The quality of the fixed network, called reservoir, is the most important factor that determines the performance of the RC system. In this paper, we investigate the influence of the hierarchical reservoir structure on the properties of the reservoir and the performance of the RC system. Analogous to deep neural networks, stacking sub-reservoirs in series is an efficient way to enhance the nonlinearity of data transformation to high-dimensional space and expand the diversity of temporal information captured by the reservoir. These deep reservoir systems offer better performance when compared to simply increasing the size of the reservoir or the number of sub-reservoirs. Low frequency components are mainly captured by the sub-reservoirs in later stage of the deep reservoir structure, similar to observations that more abstract information can be extracted by layers in the late stage of deep neural networks. When the total size of the reservoir is fixed, tradeoff between the number of sub-reservoirs and the size of each sub-reservoir needs to be carefully considered, due to the degraded ability of individual sub-reservoirs at small sizes. Improved performance of the deep reservoir structure alleviates the difficulty of implementing the RC system on hardware systems.

Highlights

  • Due to the dramatically increased computing power, advances in algorithms, and abundantly available data, machine learning, especially deep learning [1] has been successfully applied in a wide range of artificial intelligence domains recently, from computer vision to natural language processing

  • We will compare the performance of the three different reservoir structures on time series prediction tasks, and discuss the effects of stacking sub-reservoirs on the properties of the Reservoir computing (RC) system

  • As the performance of an RC system is strongly affected by the size of the readout network, we first fixed the size of the readout network to 300, i.e. corresponding to a single reservoir with 300 nodes (Shallow echo state networks (ESNs)), three independent subreservoirs with 100 nodes each (Wide ESN), and three stacked sub-reservoirs with 100 nodes each (Deep ESN)

Read more

Summary

Introduction

Due to the dramatically increased computing power, advances in algorithms, and abundantly available data, machine learning, especially deep learning [1] has been successfully applied in a wide range of artificial intelligence domains recently, from computer vision to natural language processing. In error backpropagation through time, which is a standard method to train the RNNs, calculating the gradients of the error with respect to weights involves a large amount of computation process as the current states depend on the current inputs and the previous states and inputs. When applying the chain rule to compute the gradients, repeated multiplying the derivatives of the current states with respect to the previous states can cause vanishing gradient or exploding gradient problems [8] that make it difficult to find optimal weights. As the weight matrices of the internal gates are trained by gradient-based learning algorithms, LSTM networks have been applied in a broad range of applications by fine-tuning the network for a given task, but still requires significant computation costs to train the weight matrices

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call