Optimal control and learning for cyber‐physical systems

Ye Yuan,Frank L. Lewis,Yan Wan,Tao Yang

doi:10.1002/rnc.5442

Abstract

Modern systems are becoming increasingly complex in their functionality, structure, and dynamics. A successful management of these systems requires enhanced performance in terms of robustness, safety, resiliency, scalability, and usability. To achieve these performance requirements, it is important to adopt cyber-physical system (CPS) design techniques. A CPS features tightly coupled physical and cyber components. The physical components include system dynamics, sensors, controllers, and the uncertain environment in which the system operates. The cyber components include data, communication, control, and computation. The CPS co-design principle suggests that these physical and cyber components should be designed as an integrated whole. CPS studies cross boundaries of multiple science and engineering disciplines and require deep domain knowledge. CPS applications span intelligent transportation, unmanned systems, smart grids, smart homes, smart health care, smart materials, and intelligent civil infrastructures. Developing practical closed-loop optimal decisions is a common and pivotal task for these CPS applications. The optimal control theory finds its significant value in developing such solutions. However, the traditional optimal control theory cannot be directly used because it was developed for systems that do not have the complexity level of modern systems we observe today. Significant challenges exist in developing practical optimal control solutions for real CPSs, considering the increased level of complexity and challenging performance requirements aforementioned. Addressing these challenges requires a seamless integration of the optimal control theory with advances from learning and other science and engineering domains. The performance of such integration or co-design is not fully understood or developed. This special issue focuses on the optimal control theory and learning for CPSs. The papers received span broad topics including learning and data-driven optimal control to address physical unknowns and disturbances, estimation techniques to deal with uncertainties; secure and resilient solutions that account for disturbances, faults, and attacks; control solutions subject to physical constraints on energy, actuation, communication, and computation; and CPS applications toward robotics and power grids. The 18 accepted papers are categorized into five directions and summarized as follows: In the paper by Fu et al. titled “Resilient consensus-based distributed optimization under deception attacks,” the authors investigate the problem of distributed optimization subject to L-local deception attacks. In such attacks, the attackers can modify the information transmitted through at most L communication links to each node. The authors design a resilient consensus-based distributed optimization algorithm, where the nodes cooperatively estimate the optimizers according to their subgradients and their partial neighbors' estimates. Conditions are developed for all nodes to agree on their estimations and reach a resilient optimal solution. The paper by Zhai and Vamvoudakis studies replay attacks to linear quadratic Gaussian (LQG) zero-sum games. The authors develop a data-based and private learning framework to detect and mitigate replay attacks. A Neyman–Pearson detector is used to detect replay attacks. Optimal watermarking signals are added to help the detection and achieve a trade-off between the detection performance and control performance loss. A data-based technique is developed to learn the best defending strategy in the presence of worst-case disturbances, stochastic noise, and replay attacks. Denial-of-service (DoS) attacks that block information exchange are also common in CPS. In the paper titled “Event-based resilience to DoS attacks on communication for consensus of networked Lagrangian systems,” Li et al. study the resilience of event-based consensus control of networked Lagrangian systems under DoS attacks. An event-based controller is designed in the absence of DoS attacks, and the resilience for the controller is analyzed under DoS attacks. Some conditions associated with the DoS duration and frequency are identified for the resilience of the control system. Periodic energy-limited DoS attacks are considered in the paper titled “Event-triggered output synchronization for nonhomogeneous agent systems with periodic denial-of-service attacks.” The authors Xu et al. develop a two-layer control framework for each agent of nonhomogeneous linear dynamics. The first layer is a dynamic compensator to track the dynamics of the leader node using an event-triggered protocol, and the second layer is an output regulator to synchronize to the compensator dynamics. The paper authored by Zhang et al. studies the sliding mode control of a class of interval type-2 fuzzy systems subject to intermittent DoS attacks. A switched type-2 fuzzy estimator is designed that serves as a state observer to estimate immeasurable states when DoS attacks are absent and serves as a compensator to generate measurement signals for the control when DoS attacks are in place. A switched sliding mode control is then developed in both attack free and attack active cases. Acceptable DoS region is analyzed using the switched Lyapunov analysis. The dynamics of physical components of CPS may not be completely known. Reinforcement learning is data-driven adaptive optimal control that does not require the full knowledge of physicals dynamics. The article authored by Guo et al. studies state feedback and output feedback Q-learning of a two-wheeled self-balancing robot (TWSBR). The solution realizes linear quadratic regulation (LQR) control without any knowledge of the system parameters. The controls feature a decoupling mechanism and pre-feedback to overcome computational difficulties of Q-learning, and the output Q-learning does not use discounting factors in the cost function to guarantee the closed-loop stability. Environment is also a physical component of CPS. The unknown environment, such as wind field, modulates system dynamics but may be unknown. The paper by He et al. studies minimum time-energy path planning in continuous state and control input spaces subject to unknown environment. The authors design an approximate cost function to capture both the minimum time-energy objective and actuation constraints, based on which an integral reinforcement learning (IRL)-based optimal control is developed without knowledge of the environmental disturbance. Convergence of the IRL-based control is proven. Safety is another critical concern for CPS learning-based controls. In the paper titled “Safe reinforcement learning: A control barrier function optimization approach,” the authors Marvi and Kiumarsi design a safe reinforcement learning scheme that achieves both safety and optimal control performance. This is achieved through a design that incorporates a control barrier function into the optimal control cost function without affecting the stability and optimality within the safe region. An off-policy RL is developed to learn optimal safety policy without the complete knowledge of system dynamics. In order to address both the performance optimization objective and the disturbance rejection objective, the authors Yang et al. study a mixed H2/H∞ performance optimization problem for general nonlinear dynamics with polynomial dynamics. The problem is formulated as a nonzero sum game, and a policy iteration-based framework is developed using the Hamiltonian inequality. A relaxed mixed sum-of-squares based iterative algorithm is then developed for the optimization problem, which includes both a policy improvement step for the H2 performance and policy guarantee step for the H∞ performance. In the paper titled “Deep Koopman model predictive control for enhancing transient stability in power grids,” the authors Ping et al. develop a data-driven control framework to address the challenge of nonlinear complexity in power grids. A deep neural network (DNN)-based approximate Koopman operator is used to map the original nonlinear grid dynamics into a finite dimensional linear space. A model predictive control strategy is then developed to enhance the transient stability of power grids through smartly utilizing energy storage systems in the presence of faults. The paper “Event-triggered distributed model predictive control for resilient voltage control of an islanded microgrid” by Ge et al. addresses the problem of distributed secondary voltage control of an microgrid in the islanded mode. An event-triggered distributed model predictive control scheme is designed for voltage regulation with reduced communication and computation loads subject to communication failures. A finite-time adaptive non-asymptotic observer is also designed to address the nonlinear dynamics and to facilitate the output feedback control. Automated demand response (ADR) is used to automatically control costumer power consumptions. In the paper titled “Stochastic modeling and scalable predictive control for automated demand response,” Kobayashi and Hiraishi use Markov chains to capture the complex behavior of power consumption and formulate the ADR problem as model predictive control. To solve the control problem, a mixed integer linear programming solution is developed to choose control strategy from a finite set. The method is scalable to the number of consumers. Uncertainties are common to CPS and especially human-CPS where uncertain human intensions need to be learned. Expert based ensemble learning algorithms can learn unknown probability distributions online. The paper by Young et al. develops evaluation metrics for N-expert ensemble learning algorithms named adaptiveness and consistency. Markov chain analysis is adopted to obtain quantitative relationships between mean hitting time, adaptiveness, and consistency for three different ensemble learning algorithms. Human-robot interaction studies are conducted to validate the analysis. In the paper titled “Expectation maximization based sparse identification of cyber physical system,” the authors Tang and Dong address the identification of hybrid nonlinear CPS models. A two-stage identification algorithm is developed that uses expectation maximization to identify all subsystems in the first step and then discovers the transition logic between subsystems using sparse logistic regression. Hybrid system examples are studied to demonstrate robustness of the identification approach. In the paper “Stationary target localization and circumnavigation by a non-holonomic differentially driven mobile robot: Algorithms and experiments,” the authors Wang et al. consider the problem of circumnavigation when the target location is unknown and bearing-only measurements are available. After output feedback linearization, a two-step control algorithm is applied to the dynamics, including target location estimation and circumnavigation. Estimation and trajectory errors in both steps of the control are proven to converge, and the control is also verified by experimental and simulation studies. The paper by Battilotti et al. studies the distributed infinite-horizon LQG optimal control for networked continuous-time systems. A distributed solution is developed when only local information of the network is available to nodes. This is achieved through first designing distributed LQG that depends on network information and then equipping it with a Push-Sum algorithm to compute network information in a distributed manner. The proposed control performance is proven to be arbitrarily close to the centralized case. The paper by Chen et al. focuses on the finite-time consensus of second-order multiagent systems with both input saturation and disturbances, and develops distributed controllers using relative position and relative velocity measurements in both leader-following and leadless cases. A continuous integral sliding mode method is designed to deal with bounded disturbances. The controller guarantees that the system maintains in the sliding mode from any initial state regardless of disturbances and finite-time consensus can be achieved. The paper by Wang et al. studies distributed sliding mode control for leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) subject to velocity constraints. A distributed sliding mode control law is developed for each UAV under directed communication graph. The Lyapunov theory is used to prove that the error dynamics converge to the sliding mode surface and then to the origin in finite time. Formation can be achieved without requiring the adjustable range of follower linear velocity. The Guest Editors would like to thank the Editorial Office and the Editor-in-Chief of the International Journal of Robust and Nonlinear Control, Prof. Michael Grimble, for their support of this Special Issue. Wan and Lewis would also like to NSF grants 1714519 and 1839804 for the support of this work. In addition, we thank all the authors who submit their quality papers, and special thanks go to all the anonymous reviewers for their efforts and time to accomplish the review tasks.

Full Text