Abstract

To solve the problem of tradeoff between exploration and exploitation actions in reinforcement learning, the authors have proposed two-dimensional evaluation reinforcement learning, which distinguishes between reward and punishment evaluation forecasts. The proposed method use the difference between reward evaluation and punishment evaluation as a factor for determining the action and the sum as a parameter for determining the ratio of exploration to exploitation. In this paper we described an experiment with a mobile robot searching for a path and the subsequent conflict between exploration and exploitation actions. The results of the experiment prove that using the proposed method of reinforcement learning using the tw o dimensions of reward and punishment can generate a better path than using the conventional reinforcement learning method.KeywordsArtificial IntelligenceMobile RobotProblem ComplexityLearning MethodExploitation ActionThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call