Abstract

Learning-based policy optimization methods have shown great potential for building general-purpose control systems. However, existing methods still struggle to achieve complex task objectives while ensuring policy safety during learning and execution phases for black-box systems. To address these challenges, we develop data-driven safe policy optimization (D 2 SPO), a novel reinforcement learning (RL)-based policy improvement method that jointly learns a control barrier function (CBF) for system safety and a linear temporal logic (LTL) guided RL algorithm for complex task objectives. Unlike many existing works that assume known system dynamics, by carefully constructing the data sets and redesigning the loss functions of D 2 SPO, a provably safe CBF is learned for black-box dynamical systems, which continuously evolves for improved system safety as RL interacts with the environment. To deal with complex task objectives, we take advantage of the capability of LTL in representing the task progress and develop LTL-guided RL policy for efficient completion of various tasks with LTL objectives. Extensive numerical and experimental studies demonstrate that D 2 SPO outperforms most state-of-the-art (SOTA) baselines and can achieve over 95% safety rate and nearly 100% task completion rates. The experiment video is available at https://youtu.be/2RgaH-zcmkY.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call