Abstract

The parametrization of recurrent neural network (RNN) to solve the gradient vanishing and exploding problem is critical for sequential learning. The reason lies on eigenvalue of the gradient of loss function against the recurrent weight matrix in recurrent weight matrix. To control the eigenvalues, the orthogonal constraint is imposed on the recurrent weight matrix. In this article, we analyze the designing mechanism of decomposition methods of three orthogonal constrained RNNs (OCRNNs) and derive their corresponding training algorithms. We compare the performance of the four OCRNNs and the standard long short-term memory (LSTM) network on the synthetic baseline copying and adding tasks. We find out that the coordinate descent-based iteration of the two OCRNNs proposed by us can achieve comparative or better testing error and convergence speed than the rest model. The above two OCRNN models are more cost-effective in terms of total number of parameters, which save the storage of the model. Furthermore, we introduce the pseudospectrum to visualize and monitor the evolving of the orthogonality of the recurrent weight matrix and find that the orthogonality is a empirically useful condition but not rigorously theoretical guarantee for convergence speed. Finally, we deploy the three OCRNNs on the acoustic polyp detection problem based on the and formulate the detection problem as the sequential binary classification. We also demonstrate that our proposed two OCRNNs can achieve comparative accuracy with less total number of network parameters. The quick inference time demonstrates its ability for Internet of Things (IoT) application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call