The optimization algorithm plays an important role in deep learning and significantly affects the stability and efficiency of the training process, and consequently the accuracy of the neural network approximation. A suitable (initial) learning rate is crucial for the optimization algorithm in deep learning. However, a small learning rate is usually needed to guarantee the convergence of the optimization algorithm, resulting in a slow training process. We develop in this work efficient and energy stable optimization methods for function approximation problems as well as solving partial differential equations using deep learning. In particular, we consider the gradient flows arising from deep learning from the continuous point of view, and employ the particle method (PM) and the smoothed particle method (SPM) for the space discretization while we adopt SAV-based schemes, combined with the adaptive strategy used in Adam algorithm, for the time discretization. To illustrate the effectiveness of the proposed methods, we present a number of numerical tests to demonstrate that the SAV-based schemes significantly improve the efficiency and stability of the training as well as the accuracy of the neural network approximation. We also show that the SPM approach gives a slightly better accuracy than the PM approach. We further demonstrate the advantage of using adaptive learning rate in dealing with more complex problems.
Read full abstract