Abstract

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Highlights

  • We take the history record of Unmanned Aerial Vehicle (UAV)’ actions as the current state, this leads to different states possibly having different lengths, and through experiments we find that the classic Q-learning algorithm and Deep Reinforcement Learning (DeepRL) algorithm are not suitable for solving problems in this environment

  • We proposed an Approximate State Matching Q-learning algorithm which can obtain the optimal policy for UAVs

  • We analyzed the performance of the proposed algorithm and proved its advantages over the classic Q-learning algorithm in the agricultural plant protection environment through theorems

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Deep Reinforcement Learning (DeepRL) [4] has gained remarkable achievements in many research areas such as physics-based animation, robotics, computer vision, and games It aims at finding an optimal policy that maximizes cumulative rewards and is quite suitable for solving problems with continuous and high dimensional states and actions [5]. We study the problem of forming a policy for UAVs through reinforcement learning in agricultural plant protection environment, which is used as an example to conduct research, and the model and conclusions obtained can be applied to other decision-making or reinforcement learning problems.

Background
Related Work
Reinforcement Learning
11: Return Q
Boltzman distribution:
Problem Description
Actions
Transition
Reward
Problem Solution
Approximate State Matching Q-Learning Algorithm
23: Return Sene
29: Return Svp
Analysis of Algorithms
Experiment
UAV Specifications
Farm Information
Data Simulation
Evaluation Indicator
Result
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.