Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Masking Actions in Reinforcement Learning: Enhanced PPO for Optimal Harvesting

  • TL;DR
  • Abstract
  • Literature Map
  • Similar Papers
TL;DR

This study develops a deep reinforcement learning framework with action masking to train autonomous tractor agents for herbaceous crop fields, achieving 65.2% area coverage with 250 actions, improving training efficiency and adaptability across various field shapes and sizes.

Abstract
Translate article icon Translate Article Star icon

Highlights Training of an AI agent able to drive on herbaceous crop fields efficiently with no redundancy (65.2% of area covered with a clipped number of actions). Creation of a framework to train AI agents to drive autonomously on herbaceous crop fields. Development of a DRL policy to mask forbidden actions, easing the training phase of AI agents. Abstract. Considering the landscape of today’s global agricultural sector, it is essential to align resource optimization and productivity with the development of long-term sustainable practices. Additionally, challenges such as labor shortages and the high costs of developing and maintaining traditional machinery arise. In a context defined by increasing competitiveness and the advancement of new automation technologies, developing deep reinforcement learning (DRL) algorithms emerges as an ideal solution to meet the sector’s demands. A review of the existing literature reveals that previous research on the subject encompasses various applications of DRL models for specific agricultural tasks and regions. This work proposes a trained AI agent capable of driving agricultural tractors autonomously on any kind of 2D field, regardless of its shape or size. A novel framework based on deep reinforcement learning has been developed to train the model. This framework incorporates a fully customizable reward-penalty layer, reinforcement learning policies, field shapes and sizes, tractor configurations, and neural network architectures. A novel DRL policy incorporating action masking to exclude forbidden actions is also proposed, accelerating convergence and enhancing the agent's learning efficiency. A comprehensive statistical test compares distinct agents trained on different policies and approaches. Selecting the best performing agent renders a mean covered area of 65.2% with a clipped number of actions (250). Keywords: Agriculture, Autonomous driving, Coverage path planning, Deep reinforcement learning, Navigation, PPO.

Similar Papers
  • Supplementary Content
  • 10.25394/pgs.12221960.v1
Game AI of StarCraft II based on Deep Reinforcement Learning
  • Apr 30, 2020
  • Figshare
  • Junjie Luo

The research problem of this article is the Game AI agent of StarCraft II based on Deep Reinforcement Learning (DRL). StarCraft II is viewed as the most challenging Real-time Strategy (RTS) game for now, and it is also the most popular game where researchers are developing and improving AI agents. Building AI agents of StarCraft II can help researchers on machine learning figure out the weakness of DRL and improve this series of algorithms. In 2018, DeepMind and Blizzard developed the StarCraft II Learning Environment (PySC2) to enable researchers to promote the development of AI agents. DeepMind started to develop a new project called AlphaStar after AlphaGo based on DRL, while several laboratories also published articles about the AI agents of StarCraft II. Most of them are researching on the AI agents of Terran and Zerg, which are two of three races in StarCraft II. AI agents show high-level performance compared with most StarCraft II players. However, the performance is far from defeating E-sport players because Game AI for StarCraft II has large observation space and large action space. However, there is no publication on Protoss, which is the remaining and most complicated race to deal with (larger action space, larger observation space) for AI agents due to its characteristics. Thus, in this paper, the research question is whether the AI agent of Protoss, which is developed by the model based on DRL, for a full-length game on a particular map can defeat the high-level built-in cheating AI. The population of this research design is the StarCraft II AI agents that researchers built based on their DRL models, while the sample is the Protoss AI agent in this paper. The raw data is from the game matches between the Protoss AI agent and built-in AI agents. PySC2 can capture features and numerical variables in each match to obtain the training data. The expected outcome is the model based on DRL, which can train a Protoss AI agent to defeat high-level game AI agents with the win rate. The model includes the action space of Protoss, the observation space and the realization of DRL algorithms. Meanwhile, the model is built on PySC2 v2.0, which provides additional action functions. Due to the complexity and the unique characteristics of Protoss in StarCraft II, the model cannot be applied to other games or platforms. However, how the model trains a Protoss AI agent can show the limitation of DRL and push DRL algorithm a little forward.

  • Research Article
  • Cite Count Icon 1
  • 10.4233/uuid:f8faacb0-9a55-453d-97fd-0388a3c848ee
Sample effficient deep reinforcement learning for control
  • Dec 15, 2019
  • Research Repository (Delft University of Technology)
  • Tim De Bruin

The arrival of intelligent, general-purpose robots that can learn to perform new tasks autonomously has been promised for a long time now. Deep reinforcement learning, which combines reinforcement learning with deep neural network function approximation, has the potential to enable robots to learn to perform a wide range of new tasks while requiring very little prior knowledge or human help. This framework might therefore help to finally make general purpose robots a reality. However, the biggest successes of deep reinforcement learning have so far been in simulated game settings. To translate these successes to the real world, significant improvements are needed in the ability of these methods to learn quickly and safely. This thesis investigates what is needed to make this possible and makes contributions towards this goal. <br/><br/>Before deep reinforcement learning methods can be successfully applied in the robotics domain, an understanding is needed of how, when, and why deep learning and reinforcement learning work well together. This thesis therefore starts with a literature review, which is presented in Chapter 2. While the field is still in some regards in its infancy, it can already be noted that there are important components that are shared by successful algorithms. These components help to reconcile the differences between classical reinforcement learning methods and the training procedures used to successfully train deep neural networks. The main challenges in combining deep learning with reinforcement learning center around the interdependencies of the policy, the training data, and the training targets. Commonly used tools for managing the detrimental effects caused by these interdependencies include target networks, trust region updates, and experience replay buffers. Besides reviewing these components, a number of the more popular and historically relevant deep reinforcement learning methods are discussed.<br/><br/>Reinforcement learning involves learning through trial and error. However, robots (and their surroundings) are fragile, which makes these trials---and especially errors---very costly. Therefore, the amount of exploration that is performed will often need to be drastically reduced over time, especially once a reasonable behavior has already been found. We demonstrate how, using common experience replay techniques, this can quickly lead to forgetting previously learned successful behaviors. This problem is investigated in Chapter 3. Experiments are conducted to investigate what distribution of the experiences over the state-action space leads to desirable learning behavior and what distributions can cause problems. It is shown how actor-critic algorithms are especially sensitive to the lack of diversity in the action space that can result form reducing the amount of exploration over time. Further relations between the properties of the control problem at hand and the required data distributions are also shown. These include a larger need for diversity in the action space when control frequencies are high and a reduced importance of data diversity for problems where generalizing the control strategy across the state-space is more difficult.<br/><br/>While Chapter 3 investigates what data distributions are most beneficial, Chapter 4 instead proposes practical algorithms to {select} useful experiences from a stream of experiences. We do not assume to have any control over the stream of experiences, which makes it possible to learn from additional sources of experience like other robots, experiences obtained while learning different tasks, and experiences obtained using predefined controllers. We make two separate judgments on the utility of individual experiences. The first judgment is on the long term utility of experiences, which is used to determine which experiences to keep in memory once the experience buffer is full. The second judgment is on the instantaneous utility of the experience to the learning agent. This judgment is used to determine which experiences should be sampled from the buffer to be learned from. To estimate the short and long term utility of the experiences we propose proxies based on the age, surprise, and the exploration intensity associated with the experiences. It is shown how prior knowledge of the control problem at hand can be used to decide which proxies to use. We additionally show how the knowledge of the control problem can be used to estimate the optimal size of the experience buffer and whether or not to use importance sampling to compensate for the bias introduced by the selection procedure. Together, these choices can lead to a more stable learning procedure and better performing controllers. <br/><br/>In Chapter 5 we look at what to learn form the collected data. The high price of data in the robotics domain makes it crucial to extract as much knowledge as possible from each and every datum. Reinforcement learning, by default, does not do so. We therefore supplement reinforcement learning with explicit state representation learning objectives. These objectives are based on the assumption that the neural network controller that is to be learned can be seen as consisting of two consecutive parts. The first part (referred to as the state encoder) maps the observed sensor data to a compact and concise representation of the state of the robot and its environment. The second part determines which actions to take based on this state representation. As the representation of the state of the world is useful for more than just completing the task at hand, it can also be trained with more general (state representation learning) objectives than just the reinforcement learning objective associated with the current task. We show how including these additional training objectives allows for learning a much more general state representation, which in turn makes it possible to learn broadly applicable control strategies more quickly. We also introduce a training method that ensures that the added learning objectives further the goal of reinforcement learning, without destabilizing the learning process through their changes to the state encoder. <br/><br/>The final contribution of this thesis, presented in Chapter 6, focuses on the optimization procedure used to train the second part of the policy; the mapping from the state representation to the actions. While we show that the state encoder can be efficiently trained with standard gradient-based optimization techniques, perfecting this second mapping is more difficult. Obtaining high quality estimates of the gradients of the policy performance with respect to the parameters of this part of the neural network is usually not feasible. This means that while a reasonable policy can be obtained relatively quickly using gradient-based optimization approaches, this speed comes at the cost of the stability of the learning process as well as the final performance of the controller. Additionally, the unstable nature of this learning process brings with it an extreme sensitivity to the values of the hyper-parameters of the training method. This places an unfortunate emphasis on hyper-parameter tuning for getting deep reinforcement learning algorithms to work well. Gradient-free optimization algorithms can be more simple and stable, but tend to be much less sample efficient. We show how the desirable aspects of both methods can be combined by first training the entire network through gradient-based optimization and subsequently fine-tuning the final part of the network in a gradient-free manner. We demonstrate how this enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization.<br/>

  • Research Article
  • Cite Count Icon 29
  • 10.1088/1361-6560/ac9cb3
Deep reinforcement learning and its applications in medical imaging and radiation therapy: a survey
  • Nov 11, 2022
  • Physics in Medicine & Biology
  • Lanyu Xu + 2 more

Reinforcement learning takes sequential decision-making approaches by learning the policy through trial and error based on interaction with the environment. Combining deep learning and reinforcement learning can empower the agent to learn the interactions and the distribution of rewards from state-action pairs to achieve effective and efficient solutions in more complex and dynamic environments. Deep reinforcement learning (DRL) has demonstrated astonishing performance in surpassing the human-level performance in the game domain and many other simulated environments. This paper introduces the basics of reinforcement learning and reviews various categories of DRL algorithms and DRL models developed for medical image analysis and radiation treatment planning optimization. We will also discuss the current challenges of DRL and approaches proposed to make DRL more generalizable and robust in a real-world environment. DRL algorithms, by fostering the designs of the reward function, agents interactions and environment models, can resolve the challenges from scarce and heterogeneous annotated medical image data, which has been a major obstacle to implementing deep learning models in the clinic. DRL is an active research area with enormous potential to improve deep learning applications in medical imaging and radiation therapy planning.

  • Research Article
  • Cite Count Icon 61
  • 10.1016/j.tics.2020.09.002
Artificial Intelligence and the Common Sense of Animals.
  • Oct 8, 2020
  • Trends in Cognitive Sciences
  • Murray Shanahan + 3 more

Artificial Intelligence and the Common Sense of Animals.

  • Research Article
  • Cite Count Icon 1
  • 10.1088/1742-6596/2405/1/012032
Space Manipulator Assembly Operation Technique based on Deep Residual Reinforcement Learning
  • Dec 1, 2022
  • Journal of Physics: Conference Series
  • Kui Huang + 4 more

In recent years, there are more and more space complex operational tasks such as the maintenance and assembly of on-orbit aircraft. Traditional robot planning and control methods require precise dynamic models, which are difficult to accommodate to on-orbit assembly operations in extreme space environments. Typical space operation tasks, such as plug and pull operation, whose control strategy can be artificially designed. Being artificially designed by combining the output control strategy and deep reinforcement learning algorithm, which can simplify the training difficulty of deep reinforcement learning, making the learning process more efficient and training results better. In this paper, a deep Residual reinforcement learning algorithm combined with a heuristic control strategy is constructed to complete the space mechanical arm assembly operation training in a highly realistic simulation environment. Based on the experimental data, the Residual deep reinforcement-learning algorithm designed in this paper shows the performance of rapid convergence and can complete the on-orbit assembly operation task with a high probability.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 22
  • 10.1088/1742-6596/2138/1/012011
A Review of Mobile Robot Path Planning Based on Deep Reinforcement Learning Algorithm
  • Dec 1, 2021
  • Journal of Physics: Conference Series
  • Yanwei Zhao + 2 more

Path planning refers to that the mobile robot can obtain the surrounding environment information and its own state information through the sensor carried by itself, which can avoid obstacles and move towards the target point. Deep reinforcement learning consists of two parts: reinforcement learning and deep learning, mainly used to deal with perception and decision-making problems, has become an important research branch in the field of artificial intelligence. This paper first introduces the basic knowledge of deep learning and reinforcement learning. Then, the research status of deep reinforcement learning algorithm based on value function and strategy gradient in path planning is described, and the application research of deep reinforcement learning in computer game, video game and autonomous navigation is described. Finally, I made a brief summary and outlook on the algorithms and applications of deep reinforcement learning.

  • Research Article
  • Cite Count Icon 1
  • 10.48175/ijarsct-943
DDPG Agent to Swing Up and Balance Cart- Pole System
  • Apr 9, 2021
  • International Journal of Advanced Research in Science, Communication and Technology
  • Buvanesh Pandian V

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1049/joe.2018.8314
Deep imitation reinforcement learning with expert demonstration data
  • Oct 31, 2018
  • The Journal of Engineering
  • Menglong Yi + 3 more

In recent years, deep reinforcement learning (DRL) has made impressive achievements in many fields. However, existing DRL algorithms usually require a large amount of exploration to obtain a good action policy. In addition, in many complex situations, the reward function cannot be well designed to meet task requirements. These two problems will make it difficult for DRL to learn a good action policy within a relatively short period. The use of expert data can provide effective guidance and avoid unnecessary exploration. This study proposes a deep imitation reinforcement learning (DIRL) algorithm that uses a certain amount of expert demonstration data to speed up the training of DRL. In the proposed method, the learning agent imitates the expert's action policy by learning from demonstration data. After imitation learning, DRL is used to optimise the action policy in a self‐learning way. By experimental comparison on a video game called the Mario racing game, it is shown that the proposed DIRL algorithm with expert demonstration data can obtain much better performance than previous DRL algorithms without expert guidance.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.engappai.2024.108925
Harnessing deep reinforcement learning algorithms for image categorization: A multi algorithm approach
  • Jul 17, 2024
  • Engineering Applications of Artificial Intelligence
  • Dhanvanth Reddy Yerramreddy + 4 more

Harnessing deep reinforcement learning algorithms for image categorization: A multi algorithm approach

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.agwat.2025.110030
Deep Reinforcement Learning for irrigation optimization: Advantages, opportunities, and challenges
  • Dec 1, 2025
  • Agricultural Water Management
  • Jiamei Liu + 8 more

Irrigation decision-making using Reinforcement Learning (RL) performs well in changing environment, but easily falls into sub-optimal solutions with high-dimensional data. Deep Reinforcement Learning (DRL) has fused RL with Deep Learning (DL) and excels at learning adaptive and long-term irrigation strategies directly from high-dimensional environment data. This paper systematically reviews the applications of DRL in irrigation optimization, covering both pre-trained environments based on crop growth simulators and dynamic environments driven by real-time sensors. We discussed the strengths of classic DRL algorithms, including their ability to handle dynamic and non-linear environments, and reviewed their performance in irrigation multi-objective optimization and decision-making. In addition, we identified constraints in applying DRL in irrigation decision making, which include data scarcity, poor model interpretability, and difficulties in field deployment. It shows DRL can provide a powerful framework for adaptive irrigation, but is constrained by the gap between simulation and real-world complexity. To address these limitations, we discussed approaches in future work, such as developing multi-objective DRL algorithms. These approaches will improve DRL modeling outcomes and provide a technological foundation for smart agriculture and sustainable resource management. • Review the application of Deep Reinforcement Learning (DRL) in agricultural irrigation. • Analyze the performance of DRL algorithms in irrigation decision-making. • Compare DRL models based on different environment in irrigation optimization. • Discuss the further work to improve DRL performance in irrigation optimization.

  • Conference Article
  • 10.1145/3650215.3650362
A novel portfolio strategy approach using deep reinforcement learning
  • Oct 27, 2023
  • Xu Yang + 3 more

The problem of portfolio strategy is an enduring topic in the financial field. The combination of deep learning and reinforcement learning for portfolio problems and the purpose of achieving intelligent transactions is an important research direction in the information technology era. Based on deep reinforcement learning, this paper uses deep learning BiLSTM to predict the rise and fall of stock prices, so that the agents of reinforcement learning can observe and better judge the current situation, so as to determine their own trading actions, so as to generate the optimal portfolio strategy. In this paper, ten stocks of US stocks are selected for experiments. Under the real market simulation, it is shown that compared with the benchmark method, the cumulative return of the model based on the deep reinforcement learning algorithm reaches 87.4 %, the return rate is higher, the risk is the smallest, and it has certain practical value.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-75490-7_2
Deep Reinforcement Learning: A New Frontier in Computer Vision Research
  • Jan 1, 2021
  • Sejuti Rahman + 3 more

Computer vision has advanced so far that machines now can think and see as we humans do. Especially deep learning has raised the bar of excellence in computer vision. However, the recent emergence of deep reinforcement learning is threatening to soar even greater heights as it combines deep neural networks with reinforcement learning along with numerous added advantages over both. This, being a relatively recent technique, has not yet seen many works, and so its true potential is yet to be unveiled. Thus, this chapter focuses on shedding light on the fundamentals of deep reinforcement learning, starting with the preliminaries followed by the theory and basic algorithms and some of its variations, namely, attention aware deep reinforcement learning, deep progressive reinforcement learning, and multi-agent deep reinforcement learning. This chapter also discusses some existing deep reinforcement learning works regarding computer vision such as image processing and understanding, video captioning and summarization, visual search and tracking, action detection, recognition and prediction, and robotics. This work further aims to elucidate the existing challenges and research prospects of deep reinforcement learning in computer vision. This chapter might be considered a starting point for aspiring researchers looking to apply deep reinforcement learning in computer vision to reach the pinnacle of performance in the field by tapping into the immense potential that deep reinforcement learning is showing.

  • Research Article
  • Cite Count Icon 182
  • 10.1016/j.jjimei.2022.100094
How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries–A review and research agenda
  • Jun 28, 2022
  • International Journal of Information Management Data Insights
  • Vinay Singh + 5 more

How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries–A review and research agenda

  • Research Article
  • Cite Count Icon 121
  • 10.1109/access.2020.2970433
Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle
  • Jan 1, 2020
  • IEEE Access
  • Qilei Zhang + 4 more

Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks—straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with deep interactive RL and can adapt to the actual environment by further learning from environmental rewards.

  • Conference Article
  • Cite Count Icon 10
  • 10.1145/3511616.3513104
A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition
  • Feb 14, 2022
  • Thejan Rajapakshe + 4 more

Reinforcement Learning (RL) is a semi-supervised learning paradigm which an agent learns by interacting with an environment. Deep learning in combination with RL provides an efficient method to learn how to interact with the environment is called Deep Reinforcement Learning (deep RL). Deep RL has gained tremendous success in gaming - such as AlphaGo, but its potential have rarely being explored for challenging tasks like Speech Emotion Recognition (SER). The deep RL being used for SER can potentially improve the performance of an automated call centre agent by dynamically learning emotional-aware response to customer queries. While the policy employed by the RL agent plays a major role in action selection, there is no current RL policy tailored for SER. In addition, extended learning period is a general challenge for deep RL which can impact the speed of learning for SER. Therefore, in this paper, we introduce a novel policy - "Zeta policy" which is tailored for SER and apply Pre-training in deep RL to achieve faster learning rate. Pre-training with cross dataset was also studied to discover the feasibility of pre-training the RL Agent with a similar dataset in a scenario of where no real environmental data is not available. IEMOCAP and SAVEE datasets were used for the evaluation with the problem being to recognize four emotions happy, sad, angry and neutral in the utterances provided. Experimental results show that the proposed "Zeta policy" performs better than existing policies. The results also support that pre-training can reduce the training time upon reducing the warm-up period and is robust to cross-corpus scenario.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant