Abstract

AbstractReward policy is a crucial part for Deep Reinforcement Learning (DRL) applications in Robotics. The challenges for autonomous systems with “human-like” behavior have posed significant need for a better, faster, and more robust training based on optimized reward function. Inspired by the Berkeley and Google’s work, this paper addresses our recent development in reward policy/function design. In particular, we have formulated an accelerated reward policy (ARP) based on a non-linear functions. We have applied this reward function to SAC (Soft Actor Critic) algorithm for 6 DoF (Degree of Freedom) robot training in simulated environment using Unity Gaming platform and a 6 DoF robot. This nonlinear ARP function gives bigger reward to accelerate the robot’s positive behavior during the training. Comparing to the existing algorithm our experimental results demonstrated faster convergence and bigger, better accumulative reward. With limited experimental data, the results show improved accumulative reward function as much as 2 times of the previous results.KeywordsDeep Reinforcement LearningMachine learningAutonomous systems6 DoF robotUnity

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.