Shaping Rewards Research Articles

This work develops a deep reinforcement learning based approach for Six Degree-of-Freedom (DOF) planetary powered descent and landing. Future Mars missions will require advanced guidance, navigation, and control algorithms for the powered descent phase to target specific surface locations and achieve pinpoint accuracy (landing error ellipse <5 m radius). This requires both a navigation system capable of estimating the lander’s state in real-time and a guidance and control system that can map the estimated lander state to a commanded thrust for each lander engine. In this paper, we present a novel integrated guidance and control algorithm designed by applying the principles of reinforcement learning theory. The latter is used to learn a policy mapping the lander’s estimated state directly to a commanded thrust for each engine, resulting in accurate and almost fuel-optimal trajectories over a realistic deployment ellipse. Specifically, we use proximal policy optimization, a policy gradient method, to learn the policy. Another contribution of this paper is the use of different discount rates for terminal and shaping rewards, which significantly enhances optimization performance. We present simulation results demonstrating the guidance and control system’s performance in a 6-DOF simulation environment and demonstrate robustness to noise and system parameter uncertainty.

Read full abstract

In some manipulation robotics environments, because of the difficulty of precisely modeling dynamics and computing features which describe well the variety of scene appearances, hand-programming a robot behavior is often intractable. Deep reinforcement learning methods partially alleviate this problem in that they can dispense with hand-crafted features for the state representation and do not need pre-computed dynamics. However, they often use prior information in the task definition in the form of shaping rewards which guide the robot toward goal state areas but require engineering or human supervision and can lead to sub-optimal behavior. In this work we consider a complex robot reaching task with a large range of initial object positions and initial arm positions and propose a new learning approach with minimal supervision. Inspired by developmental robotics, our method consists of a weakly-supervised stage-wise procedure of three tasks. First, the robot learns to fixate the object with a 2-camera system. Second, it learns hand-eye coordination by learning to fixate its end-effector. Third, using the knowledge acquired in the previous steps, it learns to reach the object at different positions and from a large set of initial robot joint angles. Experiments in a simulated environment show that our stage-wise framework yields similar reaching performances, compared with a supervised setting without using kinematic models, hand-crafted features, calibration parameters or supervised visual modules.

Read full abstract

Shaping Rewards Research Articles

Articles published on Shaping Rewards

ASSESSING TROPHIC STATE AND WATER QUALITY OF SMALL LAKES AND PONDS IN PERAK

FUSION SPARSE AND SHAPING REWARD FUNCTION IN SOFT ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION

Utilizing Reinforcement Learning to Continuously Improve a Primitive-Based Motion Planner

Deep reinforcement learning for six degree-of-freedom planetary landing

Stage-Wise Learning of Reaching Using Little Prior Knowledge.

Imitation Learning with Demonstrations and Shaping Rewards

Adaptive Cruise Control Based on Reinforcement Leaning with Shaping Rewards

Online learning of shaping rewards in reinforcement learning

Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning

Potential-Based Shaping and Q-Value Initialization are Equivalent

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Shaping Rewards Research Articles

Articles published on Shaping Rewards

ASSESSING TROPHIC STATE AND WATER QUALITY OF SMALL LAKES AND PONDS IN PERAK

FUSION SPARSE AND SHAPING REWARD FUNCTION IN SOFT ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION

Utilizing Reinforcement Learning to Continuously Improve a Primitive-Based Motion Planner

Deep reinforcement learning for six degree-of-freedom planetary landing

Stage-Wise Learning of Reaching Using Little Prior Knowledge.

Imitation Learning with Demonstrations and Shaping Rewards

Adaptive Cruise Control Based on Reinforcement Leaning with Shaping Rewards

Online learning of shaping rewards in reinforcement learning

Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning

Potential-Based Shaping and Q-Value Initialization are Equivalent