Abstract

Model-free control approaches require advanced exploration-exploitation policies to achieve practical tasks such as learning to bipedal robot walk in unstructured environments. In this article, we first construct a comprehensive exploration-exploitation policy that carries quality knowledge about the long-term predictor and the control policy, and the control signal of the model-free algorithms. Therefore, the developed model-free algorithm continues exploration by adjusting its unknown parameters until the desired learning and control are accomplished. Second, we provide an utterly model-free adaptive law enriched with the exploration-exploitation policy and derived step-by-step using the exact analogy of the model-based solution. The obtained adaptive control law considers the control signal saturation and the control signal (input) delay. Performed Lyapunov stability analysis ensures the convergence of the adaptive law that can also be used for intelligent control approaches. Third, we implement the adaptive algorithm in real time on a challenging benchmark system: a fourth-order, coupled dynamics, input saturated, and time-delayed underactuated manipulator. The results show that the proposed adaptive algorithm explores larger state-action spaces and treats the vanishing gradient problem in both learning and control. Also, we notice from the results that the learning and control properties of the adaptive algorithm are optimized as required.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call