Abstract

Deep Neural Networks (DNNs) are widely used for various applications. Although adaptive learning rate algorithms are attractive for DNN training, their theoretical performance remains unclear. In fact, published analyses consider only simple optimization settings such as convex optimization, none of which are applicable to DNN training. This paper proposes TSO-ALRA, a two-stage optimizer using an adaptive learning rate algorithm; it is based on a full analysis of two approaches that do suit DNNs: parameter updates along geodesics on the statistical manifold and covariance structure of gradients. Our analysis reveals that the diagonal approximation used by existing adaptive learning rate algorithms inevitably degrades their efficiency. In addition, our analysis suggests that adaptive learning rate algorithms suffer drops in generalization performance in the last phase of training. To overcome these problems, TSO-ALRA combines an effective approximation technique and a switching strategy. Our experiments on several models and datasets show that TSO-ALRA efficiently converges with high generalization performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.