Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Wanli Shi,Bin Gu

doi:10.1609/aaai.v35i11.17158

Abstract

Hyperparameter optimization (HO) is an important problem in machine learning which is normally formulated as a bilevel optimization problem. Gradient-based methods are dominant in bilevel optimization due to their high scalability to the number of hyperparameters, especially in a deep learning problem. However, traditional gradient-based bilevel optimization methods need intermediate steps to obtain the exact or approximate gradient of hyperparameters, namely hypergradient, for the upper-level objective, whose complexity is high especially for high dimensional datasets. Recently, a penalty method has been proposed to avoid the computation of the hypergradient, which speeds up the gradient-based BHO methods. However, the penalty method may result in a very large number of constraints, which greatly limits the efficiency of this method, especially for high dimensional data problems. To address this limitation, in this paper, we propose a doubly stochastic gradient descent algorithm (DSGPHO) to improve the efficiency of the penalty method. Importantly, we not only prove the proposed method can converge to the KKT condition of the original problem in a convex setting, but also provide the convergence rate of DSGPHO which is the first result in the references of gradient-based bilevel optimization as far as we know. We compare our method with three state-of-the-art gradient-based methods in three tasks, i.e., data denoising, few-shot learning, and training data poisoning, using several large-scale benchmark datasets. All the results demonstrate that our method outperforms or is comparable to the existing methods in terms of accuracy and efficiency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 1

Similar Papers

A General Descent Aggregation Framework for Gradient-Based Bi-Level Optimization.
Risheng Liu ... Xiaoming Yuan
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45
Risheng Liu, et. al.Risheng Liu ... Xiaoming Yuan
01 Jan 2023
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45

Multi-objective meta-learning
Feiyang Ye ... Ivor W Tsang
Artificial Intelligence | VOL. 335
Feiyang Ye, et. al.Feiyang Ye ... Ivor W Tsang
25 Jul 2024
Artificial Intelligence | VOL. 335

Value-Function-Based Sequential Minimization for Bi-Level Optimization.
Risheng Liu ... Yixuan Zhang
IEEE transactions on pattern analysis and machine intelligence | VOL. PP
Risheng Liu, et. al.Risheng Liu ... Yixuan Zhang
01 Dec 2023
IEEE transactions on pattern analysis and machine intelligence | VOL. PP

On the Maximum Achievable Sum-Rate of the RIS-Aided MIMO Broadcast Channel
Nemanja Stefan Perović ... Le-Nam Tran
IEEE Transactions on Signal Processing | VOL. 70
Nemanja Stefan Perović, et. al.Nemanja Stefan Perović ... Le-Nam Tran
01 Jan 2021
IEEE Transactions on Signal Processing | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence