Abstract

We propose an algorithmic pipeline enabling deep reinforcement learning controllers to detect when a significant change in system characteristics has occurred and update the control policy accordingly to reattain performance. Reinforcement learning algorithms can learn a policy directly from input–output data and thus optimize for system-specific properties. Yet they face difficulties to adapt, after deployment, to varying operating conditions. Real-world industrial mechatronic systems however demand further levels of performance through adaptation while remaining safe. So far, methods that detect changes in environments exist but have never been studied and applied as a means to update control policies for time-varying systems. We benchmark several methods that detect significant changes in these systems, i.e. shiftpoint detection methods, and present a novel algorithm with a dual regularization architecture. This architecture exploits the prior policy while allowing sufficient flexibility to update for the safety-critical and time-varying system. We validate the method’s performance through benchhmarking and study the effect of its different components and targeted ablation studies on mechatronic systems, both in simulations and experimentally. Results show that our algorithmic pipeline allows for rapid shiftpoint detection, followed by a policy update that reaches expert performance after convergence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call