As the thematic literature of machine learning suggests, reinforcement learning falls between the two methods of supervised learning and unsupervised learning. In this method, the learning agent receives a reward or punishment from the environment according to its action. Therefore, the learning agent interacts with the environment through trial and error and learns to choose the optimal action to achieve its goal. In the meantime, the eligibility traces are considered as one of the main mechanisms of reinforcement learning in receiving delayed rewards. In the use of conventional reinforcement learning methods, when the learning agent achieves a goal, only the value function of the last state-action pair changes, but all the trace states are affected based on the eligibility traces. In other words, the delayed rewards are distributed throughout the trace. Like the ant pheromone effect, this method can increase learning speed empirically to some extent. A soccer robot on the field encounters moving obstacles such as balls, rival robots, and home robots, and fixed obstacles such as gates and flags, so its environment is in a very dynamic state. Therefore, due to the dynamic environment, it is an important issue for autonomous soccer robots to avoid obstacles and should be considered in real play. The main idea of this study is to determine the appropriate traces for the robot to move towards the ball and ultimately to score with the approach of avoiding obstacles in simulating a real soccer match. The desired results obtained from a game played indicate a high level of online experience and decision-making power in the face of new situations.
Read full abstract