Convergence Of Stochastic Gradient Descent Research Articles

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the presence of noise, when a termination of the stochastic gradient method after a finite number of steps is considered, the arithmetic mean is not necessarily the best possible approximation to the unknown optimal solution. This paper aims at identifying optimal strategies in a particularly simple case, the minimization of a strongly convex function with i. i. d. noise terms and termination after a finite number of steps. Explicit formulas for the stochastic error and the optimality error are derived in dependence of certain parameters of the SGD method. The aim was to choose parameters such that both stochastic error and optimality error are reduced compared to arithmetic averaging. This aim could not be achieved; however, by allowing a slight increase of the stochastic error it was possible to select the parameters such that a significant reduction of the optimality error could be achieved. This reduction of the optimality error has a strong effect on the approximate solution generated by the stochastic gradient method in case that only a moderate number of iterations is used or when the initial error is large. The numerical examples confirm the theoretical results and suggest that a generalization to non-quadratic objective functions may be possible.

Read full abstract

AbstractThe nonlinear characteristics of bridge aerodynamics preclude a closed‐form solution of limit‐cycle oscillation (LCO) amplitude and frequency in the post‐flutter stage. To address this issue, a long short‐term memory (LSTM) network is utilized as the reduced‐order modeling of nonlinear aeroelastic forces on the bridge deck section, and it is repeatedly employed to generate force inputs at spanwise nodes of a three‐dimensional (3D) finite element model (FEM) of the long‐span bridge (using spatial beam elements). All LSTM networks are dynamically coupled through FEM, and the 3D nonlinear flutter response is accordingly obtained. To improve the simulation accuracy and reduce the required training data of the standard LSTM network, both general knowledge (motivated by the gating mechanism and mathematical models for information processing) and domain knowledge (resulting from the basic understanding of bridge aerodynamics) are leveraged to, respectively, customize the LSTM cell and network architecture. In addition, a fast‐training algorithm effectively combining the linear convergence of stochastic gradient descent and superlinear convergence of modified Broyden–Fletcher–Goldfarb–Shanno is developed to improve the training efficiency of the obtained knowledge‐enhanced LSTM network. To further advance the computational efficiency of the coupled LSTM‐FEM nonlinear flutter analysis, the convolution‐based numerical integration is adopted in the finite element modeling of long‐span bridge dynamics. A case study of a long‐span suspension bridge under strong winds demonstrates the proposed 3D nonlinear flutter analysis presents high simulation efficiency and accuracy and can be utilized to effectively obtain the nonlinear LCO characteristics in a wide range of post‐flutter wind speeds.

Read full abstract

Convergence Of Stochastic Gradient Descent Research Articles

Related Topics

Articles published on Convergence Of Stochastic Gradient Descent

Approximating Hessian matrices using Bayesian inference: a new approach for quasi-Newton methods in stochastic optimization

Optimized convergence of stochastic gradient descent by weighted averaging

Modeling nonlinear flutter behavior of long‐span bridges using knowledge‐enhanced long short‐term memory network

Accelerating variance-reduced stochastic gradient methods

Variance Counterbalancing for Stochastic Large-scale Learning

On the Convergence of Stochastic Gradient Descent for Nonlinear Ill-Posed Problems

An SGD-based meta-learner with “growing” descent

Influence of feature scaling on convergence of gradient iterative algorithm

Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence.

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Convergence Of Stochastic Gradient Descent Research Articles

Related Topics

Articles published on Convergence Of Stochastic Gradient Descent

Approximating Hessian matrices using Bayesian inference: a new approach for quasi-Newton methods in stochastic optimization

Optimized convergence of stochastic gradient descent by weighted averaging

Modeling nonlinear flutter behavior of long‐span bridges using knowledge‐enhanced long short‐term memory network

Accelerating variance-reduced stochastic gradient methods

Variance Counterbalancing for Stochastic Large-scale Learning

On the Convergence of Stochastic Gradient Descent for Nonlinear Ill-Posed Problems

An SGD-based meta-learner with “growing” descent

Influence of feature scaling on convergence of gradient iterative algorithm

Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence.

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm