Abstract

A variety of reinforcement learning (RL) methods are developed to achieve the motion control for the robotic systems, which has been a hot issue. However, the performance of the conventional RL methods often encounters a bottleneck, because the robots have difficulty in choosing an appropriate action in the control task due to the exploration-exploitation dilemma. To address this problem and improve the learning performance, this work introduces an experience aggregative reinforcement learning method with a Multi-Attribute Decision-Making (MADM) to achieve the real-time obstacle avoidance of wheeled mobile robot (WMR). The proposed method employs an experience aggregation method to cluster experiential samples and it can achieve more effective experience storage. Moreover, to achieve the effective action selection using the prior experience, an action selection policy based on a Multi-Attribute Decision-Making is proposed. Inspired by the hierarchical decision-making, this work decomposes the original obstacle avoidance task into two sub-tasks using a divide-and-conquer approach. Each sub-task is trained individually by a double Q-learning using a simple reward function. Each sub-task learns an action policy, which enables the sub-task to selects an appropriate action to achieve a single goal. The standardized rewards of sub-tasks are calculated when fusing these sub-tasks to eliminate differences in rewards for sub-tasks. Then, the proposed method integrates the prior experience of three trained sub-tasks via an action policy based on a MADM to complete the source task. Simulation results show that the proposed method outperforms competitors.

Highlights

  • REINFORCEMENT LEARNING FOR ROBOTICS Reinforcement learning is a type of machine learning method for intelligent robots, which allows the robots to automatically discover the environment via a trial-and-error approach [5]

  • Reinforcement learning is suitable for a type of robot control problem in the statistical and control field, and it can learn how to complete a complex behavior by the feedback via repeated attempts [31]

  • Ordered weighted averaging (OWA) operator determines the weights for each attribute according to their importance, which is an effective and easy to develop among all aggregation operators

Read more

Summary

INTRODUCTION

Previous researchers working in the obstacle avoidance tasks utilize the probabilistic exploration method to balance the exploration-exploitation, such as the epsilon-greedy [11]. Experience aggregation, and developing an action policy utilizing the prior experience of trained sub-tasks. To achieve the third way, this study developed an action policy using the MADM and experience aggregation, which regards the action space as the scheme set and regards the value functions from trained sub-tasks as attributes for these schemes. 3) To address the exploration-exploitation in the obstacle avoidance task, this work proposes an action policy with experience aggregation and a MADM. This action policy determines an action depending on the learned experience of these sub-tasks.

Q-LEARNING
THE EPSILON-GREEDY STRATEGY
THE RL MODEL FOR OBSTACLE AVOIDANCE OF WHEELED MOBILE ROBOT
THE TRAINING ALGORITHM FOR EACH SUB-TASK
Initialization
EXPERIENCE AGGREGATION IN THE EXPLORATION PROCESS
22. End if
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call