Abstract

AbstractWe present a class of reinforcement learning (RL) controllers that is of actor-critic type. We focus on the direct heuristic dynamic programming (dHDP) method. Over the past several years, new analysis and synthesis of the dHDP as a reinforcement learning controller as well as impressive applications of the dHDP have emerged. In this chapter we provide a summary on how the dHDP works, what analytical properties it possesses, and how it was applied and implemented in a wearable robot for the automatic tuning of the prosthesis control parameters with a human user in the loop.KeywordsReinforcement learningAdaptive/approximation dynamic programOptimal controlStochastic gradient descentAdaptive optimal controlDynamic programmingBellman optimalityDirect heuristic dynamic programmingRobotic prosthesis control

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.