Gradient-based optimization and control frameworks have been utilized in several applications. The learning rate parameter is typically chosen following a schedule or using methods such as line search to enhance the convergence rate. Recently, the machine learning community has developed methodologies for automated tuning of the learning rate, known as adaptive gradient methods. This paper develops a control theory-inspired framework for modeling adaptive gradient methods that solve non-convex optimization problems. We first model the adaptive gradient methods in a state–space framework, which allows us to present simpler convergence proofs of prominent adaptive optimizers, such as AdaGrad, Adam, and AdaBelief. The proposed framework is constructive because it allows synthesizing new adaptive optimizers. To illustrate this fact, we then utilize the transfer function paradigm from classical control to propose a new variant of Adam, coined AdamSSM, and prove its convergence. We add an appropriate pole-zero pair in the transfer function from squared gradients to the second moment estimate. Applications on benchmark machine learning tasks of image classification using CNN architectures and language modeling using LSTM architecture demonstrate that the AdamSSM algorithm improves the gap between generalization accuracy and faster convergence than the recent adaptive gradient methods.