Abstract

Aiming at the formation and path planning of multirobot systems in an unknown environment, a path planning method for multirobot formation based on improved Q‐learning is proposed. Based on the leader‐following approach, the leader robot uses an improved Q‐learning algorithm to plan the path and the follower robot achieves a tracking strategy of gravitational potential field (GPF) by designing a cost function to select actions. Specifically, to improve the Q‐learning, Q‐value is initialized by environmental guidance of the target’s GPF. Then, the virtual obstacle‐filling avoidance strategy is presented to fill non‐obstacles which is judged to tend to concave obstacles with virtual obstacles. Besides, the simulated annealing (SA) algorithm whose controlling temperature is adjusted in real time according to the learning situation of the Q‐learning is applied to improve the action selection strategy. The experimental results show that the improved Q‐learning algorithm reduces the convergence time by 89.9% and the number of convergence rounds by 63.4% compared with the traditional algorithm. With the help of the method, multiple robots have a clear division of labor and quickly plan a globally optimized formation path in a completely unknown environment.

Highlights

  • As robots become more and more widely used in various industries, a single robot cannot be competent for complex tasks

  • (1) Select an action at at state st according to ε‐greedy; % ε‐greedy is the action selection strategy; (2) Execute the action at, enter state st+1 and get a reward rt; %Get immediate rewards by performing actions to interact with environment (3) Update Qðst, atÞ using Qðst, atÞ = Qðst, atÞ + α1⁄2rt + γ maaxQðst+1, aÞ − Qðst, atފ; % Update the value function according to the update equation by using the reward (4) st ⟵ st+1; %Update state end-while Episode = episode + 1; % Update episode end-for end Algorithm 1: Classical Q-learning algorithm

  • The steps of the tracking strategy based on gravitational potential field (GPF) for the follower robot are as follows: Step 1: if the follower robot obtains the coordinates broadcast by the leader robot, it will determine the target state according to the formation, i.e., the desired target position at this time

Read more

Summary

Introduction

As robots become more and more widely used in various industries, a single robot cannot be competent for complex tasks. Based on the path planning of a single robot, Sruthi et al [11] designed a nonlinear controller for tracking to achieve the multirobot formation. By mixing formation control with leader-following and priority methods, Sang et al [12] used the MTAPF algorithm with an improved A∗ algorithm for path planning. The above methods all initialize the Q-value by some prior information to improve the algorithm, without considering the avoidance of concave obstacles and the adjustment of the action selection strategy. The innovation in this paper is as follows: The improved Q-learning algorithm is presented to plan paths, in which environmental guidance and virtual obstacle-filling avoidance strategy are added to accelerate convergence and the SA algorithm is applied to improve the action selection strategy; the follower robot can achieve the tracking strategy of GPF by designing the cost function to select actions

Related Methods
Improved Q-Learning Proposed for Path Planning of Leader Robot
A Path Planning Method for multirobot Formation
Experiments Analysis
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call