Abstract

The gradient-based optimization method for deep machine learning models suffers from gradient vanishing and exploding problems, particularly when the computational graph becomes deep. In this work, we propose the tangent-space gradient optimization (TSGO) for probabilistic models to keep the gradients from vanishing or exploding. The central idea is to guarantee the orthogonality between variational parameters and gradients. The optimization is then implemented by rotating the parameter vector towards the direction of gradient. We explain and test TSGO in tensor network (TN) machine learning, where TN describes the joint probability distribution as a normalized state |ψ〉 in Hilbert space. We show that the gradient can be restricted in tangent space of 〈ψ|ψ〉=1 hypersphere. Instead of additional adaptive methods to control the learning rate η in deep learning, the learning rate of TSGO is naturally determined by rotation angle θ as η=tanθ. Our numerical results reveal better convergence of TSGO in comparison to the off-the-shelf Adam.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.