Learning the Optimal Control for Evolving Systems with Converging Dynamics

Qingsong Liu,Zhixuan Fang

doi:10.1145/3673660.3655062

Abstract

We consider a principle or controller that can pick actions from a fixed action set to control an evolving system with converging dynamics. The converging dynamics means that, if the principle holds the same action, the system will asymptotically converge to a unique stable state determined by this action. In our model, the dynamics of the system are unknown to the principle, and the principle can only receive bandit feedback (maybe noisy) on the impacts of his actions. The principle aims to learn which stable state yields the highest reward while adhering to specific constraints and to immerse the system into this state as quickly as possible. We measure the principle's performance in terms of regret and constraint violation. In cases where the action set is finite, we propose an algorithm Optimistic-Pessimistic Convergence and Confidence Bounds (OP-C2B) that ensures sublinear regret and constraint violation simultaneously. Particularly, OP-C2B achieves logarithmic regret and constraint violation when the system convergence rate is linear or superlinear. Furthermore, we generalize our algorithm OP-C2B to the case of an infinite action set and demonstrate its ability to maintain sublinear regret and constraint violation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning the Optimal Control for Evolving Systems with Converging Dynamics

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review

Lead the way for us

Similar Papers

Learning the Optimal Control for Evolving Systems with Converging Dynamics
Qingsong Liu ... Zhixuan Fang
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 8
Qingsong Liu, et. al.Qingsong Liu ... Zhixuan Fang
21 May 2024
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 8

A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action
Takashi Watanabe ... Takashi Sakuragawa
-
Takashi Watanabe, et. al.Takashi Watanabe ... Takashi Sakuragawa
17 Jan 2020
17 Jan 2020

Online Stochastic Optimization With Time-Varying Distributions
Xuanyu Cao ... Junshan Zhang
IEEE Transactions on Automatic Control | VOL. 66
Xuanyu Cao, et. al.Xuanyu Cao ... Junshan Zhang
21 May 2020
IEEE Transactions on Automatic Control | VOL. 66

The augmented Lagrangian method can approximately solve convex optimization with least constraint violation
Yu-Hong Dai ... Liwei Zhang
Mathematical Programming | VOL. 200
Yu-Hong Dai, et. al.Yu-Hong Dai ... Liwei Zhang
17 Jun 2022
Mathematical Programming | VOL. 200

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning the Optimal Control for Evolving Systems with Converging Dynamics

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review