Information Aggregation for Constrained Online Control

Tongxin Li,Yue Chen,Bo Sun,Adam Wierman,Steven Low

doi:10.1145/3410220.3461737

Abstract

This paper considers an online control problem involving two controllers. A central controller chooses an action from a feasible set that is determined by time-varying and coupling constraints, which depend on all past actions and states. The central controller's goal is to minimize the cumulative cost; however, the controller has access to neither the feasible set nor the dynamics directly, which are determined by a remote local controller. Instead, the central controller receives only an aggregate summary of the feasibility information from the local controller, which does not know the system costs. We show that it is possible for an online algorithm using feasibility information to nearly match the dynamic regret of an online algorithm using perfect information whenever the feasible sets satisfy a causal invariance criterion and there is a sufficiently large prediction window size. To do so, we use a form of feasibility aggregation based on entropic maximization in combination with a novel online algorithm, named Penalized Predictive Control (PPC) and demonstrate that aggregated information can be efficiently learned using reinforcement learning algorithms. The effectiveness of our approach for closed-loop coordination between central and local controllers is validated via an electric vehicle charging application in power systems.

Highlights

ACM Reference Format: Tongxin Li, Yue Chen, Bo Sun, Adam Wierman, and Steven Low. 2021
The dynamical system is governed by a local controller, which manages a large fleet of controllable units
The collection of the states of the units is represented by xt in a state space X ⊆ Rn. Both the state and action at each time are confined by safety sets that maybe time-varying and time-coupling, i.e., xt ∈ Xt (x

Summary

Introduction

ACM Reference Format: Tongxin Li, Yue Chen, Bo Sun, Adam Wierman, and Steven Low. 2021. The central controller receives time-varying cost functions ct online from an external environment and each ct (·) : U → R+ only depends on the action ut chosen by the central controller. The goal of an online control policy in this setting is to make the local and central controllers jointly minimize a cumulative cost CT (u) :=

Results

Conclusion