Preference-based deep reinforcement learning with automatic curriculum learning for map-free UGV navigation in factory-like environments
Preference-based deep reinforcement learning with automatic curriculum learning for map-free UGV navigation in factory-like environments
23
- 10.1109/tie.2022.3148753
- Jan 1, 2023
- IEEE Transactions on Industrial Electronics
16
- 10.3390/machines10070500
- Jun 22, 2022
- Machines
2853
- 10.1145/504729.504754
- Mar 1, 2002
- Communications of the ACM
12
- 10.1007/s11227-023-05489-5
- Jun 20, 2023
- The Journal of Supercomputing
18
- 10.1109/lra.2023.3269295
- Jun 1, 2023
- IEEE Robotics and Automation Letters
16
- 10.1016/j.eswa.2024.123202
- Jan 11, 2024
- Expert Systems with Applications
26
- 10.1016/j.aei.2023.101959
- Apr 1, 2023
- Advanced Engineering Informatics
16
- 10.1016/j.aei.2023.102328
- Dec 20, 2023
- Advanced Engineering Informatics
4
- 10.1007/s00521-023-08385-4
- Jun 14, 2023
- Neural Computing and Applications
45
- 10.1186/s13638-020-01721-5
- May 14, 2020
- EURASIP Journal on Wireless Communications and Networking
- Research Article
- 10.62051/ijcsit.v3n3.17
- Aug 12, 2024
- International Journal of Computer Science and Information Technology
This article mainly discusses the application methods, progress and challenges of deep reinforcement learning (DRL) in intelligent navigation. With the development of computer and artificial intelligence technology, deep learning and reinforcement learning are combined to form deep reinforcement learning. This method shows significant advantages in processing high-dimensional state spaces and complex decision-making tasks. The article first reviews traditional navigation methods, including simulated annealing methods, artificial potential field methods, and fuzzy logic methods. Then it analyzes graph-based methods such as A* algorithm, probabilistic landmark method, and rapid exploration of random trees, as well as bionic intelligence methods such as Genetic algorithm, artificial neural network, ant colony optimization and particle swarm optimization, etc. Subsequently, the article introduces in detail the basic principles of reinforcement learning and its value-oriented, strategy-oriented and combination-oriented methods, focusing on the specific application of deep reinforcement learning in navigation tasks, such as deep Q network (DQN), deep deterministic strategy gradient (DDPG) and dominant actor-critic (A2C) algorithms. Finally, the article discusses the advantages, challenges and future development directions of deep reinforcement learning in navigation applications, emphasizing the technology's path planning and decision-making optimization capabilities in complex dynamic environments.
- Research Article
13
- 10.1088/1742-6596/2138/1/012011
- Dec 1, 2021
- Journal of Physics: Conference Series
Path planning refers to that the mobile robot can obtain the surrounding environment information and its own state information through the sensor carried by itself, which can avoid obstacles and move towards the target point. Deep reinforcement learning consists of two parts: reinforcement learning and deep learning, mainly used to deal with perception and decision-making problems, has become an important research branch in the field of artificial intelligence. This paper first introduces the basic knowledge of deep learning and reinforcement learning. Then, the research status of deep reinforcement learning algorithm based on value function and strategy gradient in path planning is described, and the application research of deep reinforcement learning in computer game, video game and autonomous navigation is described. Finally, I made a brief summary and outlook on the algorithms and applications of deep reinforcement learning.
- Research Article
18
- 10.3390/e24121787
- Dec 6, 2022
- Entropy
With the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorithms perform curriculum ranking through expert experience and a single network, which has the problems of difficult curriculum task ranking and slow convergence speed. In this paper, we propose a curriculum reinforcement learning method based on K-Fold Cross Validation that can estimate the relativity score of task curriculum difficulty. Drawing lessons from the human concept of curriculum learning from easy to difficult, this method divides automatic curriculum learning into a curriculum difficulty assessment stage and a curriculum sorting stage. Through parallel training of the teacher model and cross-evaluation of task sample difficulty, the method can better sequence curriculum learning tasks. Finally, simulation comparison experiments were carried out in two types of multi-agent experimental environments. The experimental results show that the automatic curriculum learning method based on K-Fold cross-validation can improve the training speed of the MADDPG algorithm, and at the same time has a certain generality for multi-agent deep reinforcement learning algorithm based on the replay buffer mechanism.
- Research Article
- 10.4233/uuid:f8faacb0-9a55-453d-97fd-0388a3c848ee
- Dec 15, 2019
Sample effficient deep reinforcement learning for control
- Book Chapter
1
- 10.1007/978-3-030-75490-7_2
- Jan 1, 2021
Computer vision has advanced so far that machines now can think and see as we humans do. Especially deep learning has raised the bar of excellence in computer vision. However, the recent emergence of deep reinforcement learning is threatening to soar even greater heights as it combines deep neural networks with reinforcement learning along with numerous added advantages over both. This, being a relatively recent technique, has not yet seen many works, and so its true potential is yet to be unveiled. Thus, this chapter focuses on shedding light on the fundamentals of deep reinforcement learning, starting with the preliminaries followed by the theory and basic algorithms and some of its variations, namely, attention aware deep reinforcement learning, deep progressive reinforcement learning, and multi-agent deep reinforcement learning. This chapter also discusses some existing deep reinforcement learning works regarding computer vision such as image processing and understanding, video captioning and summarization, visual search and tracking, action detection, recognition and prediction, and robotics. This work further aims to elucidate the existing challenges and research prospects of deep reinforcement learning in computer vision. This chapter might be considered a starting point for aspiring researchers looking to apply deep reinforcement learning in computer vision to reach the pinnacle of performance in the field by tapping into the immense potential that deep reinforcement learning is showing.
- Research Article
33
- 10.1049/cit2.12043
- Apr 21, 2021
- CAAI Transactions on Intelligence Technology
Here, the challenges of sample efficiency and navigation performance in deep reinforcement learning for visual navigation are focused and a deep imitation reinforcement learning approach is proposed. Our contributions are mainly three folds: first, a framework combining imitation learning with deep reinforcement learning is presented, which enables a robot to learn a stable navigation policy faster in the target‐driven navigation task. Second, the surrounding images is taken as the observation instead of sequential images, which can improve the navigation performance for more information. Moreover, a simple yet efficient template matching method is adopted to determine the stop action, making the system more practical. Simulation experiments in the AI‐THOR environment show that the proposed approach outperforms previous end‐to‐end deep reinforcement learning approaches, which demonstrate the effectiveness and efficiency of our approach.
- Supplementary Content
1
- 10.1016/j.neuron.2021.01.021
- Feb 1, 2021
- Neuron
What can classic Atari video games tell us about the human brain?
- Research Article
18
- 10.1088/1361-6560/ac9cb3
- Nov 11, 2022
- Physics in Medicine & Biology
Reinforcement learning takes sequential decision-making approaches by learning the policy through trial and error based on interaction with the environment. Combining deep learning and reinforcement learning can empower the agent to learn the interactions and the distribution of rewards from state-action pairs to achieve effective and efficient solutions in more complex and dynamic environments. Deep reinforcement learning (DRL) has demonstrated astonishing performance in surpassing the human-level performance in the game domain and many other simulated environments. This paper introduces the basics of reinforcement learning and reviews various categories of DRL algorithms and DRL models developed for medical image analysis and radiation treatment planning optimization. We will also discuss the current challenges of DRL and approaches proposed to make DRL more generalizable and robust in a real-world environment. DRL algorithms, by fostering the designs of the reward function, agents interactions and environment models, can resolve the challenges from scarce and heterogeneous annotated medical image data, which has been a major obstacle to implementing deep learning models in the clinic. DRL is an active research area with enormous potential to improve deep learning applications in medical imaging and radiation therapy planning.
- Conference Article
1
- 10.1109/humanoids53995.2022.10000201
- Nov 28, 2022
We solve a pedestrian visual navigation problem with a first-person view in an urban setting via deep reinforcement learning in an end-to-end manner. The major challenges lie in severe partial observability and sparse positive experiences of reaching the goal. To address partial observability, we propose a novel 3D-temporal convolutional network to encode sequential historical visual observations, its effectiveness is verified by comparing to a commonly-used Frame-Stacking approach. For sparse positive samples, we propose an improved automatic curriculum learning algorithm NavACL <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">+</sup> , which proposes meaningful curricula starting from easy tasks and gradually generalizing to challenging ones. NavACL <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">+</sup> is shown to facilitate the learning process with 21% earlier convergence, to improve the task success rate on difficult tasks by 40% compared to the original NavACL algorithm [1] and to offer enhanced generalization to different initial poses compared to training from a fixed initial pose.
- Research Article
6
- 10.1360/n972016-00741
- Sep 20, 2016
- Chinese Science Bulletin
Learning ability is the basic characteristic of human intelligence. The July 1, 2005 issue of Science published a list of 125 important questions in science. Among them, the question 94 “What are the limits of learning by machines?”. The annotation “Computers can already beat the world’s best chess players, and they have a wealth of information on the Web to draw on. But abstract reasoning is still beyond any machine”. In recent artificial intelligence has made great progresses. In 1997, the rise of the man-machine war, IBM Supercomputer Deep Blue defeated the chess master Garry Kasparov. On February 14, 2011, IBM’s Watson supercomputer won a practice round against Jeopardy champions Ken Jennings and Brad Rutter. In March 2016, Google DeepMind’s AlphaGo sealed a 4-1 victory over a South Korean Go grandmaster Lee Se-dol. This paper focuses on the machine learning methods of AlphaGo, including reinforcement learning, deep learning, deep reinforcement learning, analysis of the existing problems and the latest research progress. Deep reinforcement learning is the combination of deep learning and reinforcement learning, which can realize the learning algorithm from the perception to action. Simply said, this is the same as human behavior, input sensing information such as vision, and then, direct output action through the deep neural network. Deep reinforcement learning has the potential to learn a variety of skills for the robot to achieve full autonomy. Even though reinforcement learning is practiced successfully, but feature states need to manually set, for complex scene is a difficult thing, especially easy to cause the dimension disaster, and expression is not good. In 2010, Sascha Lange and Martin Riedmiller proposed deep auto-encoder neural networks in reinforcement learning to extract feature, which is used to control the visual correlation. In 2013, DeepMind proposed deep Q-network (DQN) in NIPS 2013, using convolution neural network to extract features, and then applied in reinforcement learning. They continue to improve and published an improved version of DQN on Nature in 2015, which has aroused widespread concern. In order to break through the limits of learning by machines, cognitive machine learning is proposed, which is the combination of machine learning and brain cognition, so that the machine intelligence is constantly evolving, and gradually reaches the human level of artificial intelligence. A cognitive model entitled Consciousness And Memory (CAM) is proposed by author, which consists of memory, consciousness, high-level cognitive functions, perception and motor. High-level cognitive functions of the brain include learning, language, thinking, decision making, emotion, and so on. Learning is a course to accept the stimulus through the nervous system and obtain new behavior, habits and accumulation experience. According to the current research progress of brain science and cognitive science, cognitive machine learning may be interested in learning emergence, procedural memory knowledge learning, learning evolution and so on. For intelligence, so-called evolution is refers to the learning of learning and the structure also follows the change. It is important to record the learning result by structure changing and improve the learning method.
- Research Article
24
- 10.1016/j.tics.2020.09.002
- Oct 8, 2020
- Trends in Cognitive Sciences
Artificial Intelligence and the Common Sense of Animals.
- Conference Article
2
- 10.1117/12.2622154
- Jun 6, 2022
Traditionally, learning from human demonstrations via direct behavior cloning can lead to high-performance policies given that the algorithm has access to large amounts of high-quality data covering the most likely scenarios to be encountered when the agent is operating. However, in real-world scenarios, expert data is limited and it is desired to train an agent that learns a behavior policy general enough to handle situations that were not demonstrated by the human expert. Another alternative is to learn these policies with no supervision via deep reinforcement learning, however, these algorithms require a large amount of computing time to perform well on complex tasks with high-dimensional state and action spaces, such as those found in <i>StarCraft II</i>. Automatic curriculum learning is a recent mechanism comprised of techniques designed to speed up deep reinforcement learning by adjusting the difficulty of the current task to be solved according to the agent's current capabilities. Designing a proper curriculum, however, can be challenging for sufficiently complex tasks, and thus we leverage human demonstrations as a way to guide agent exploration during training. In this work, we aim to train deep reinforcement learning agents that can command multiple heterogeneous actors where starting positions and overall difficulty of the task are controlled by an automatically-generated curriculum from a single human demonstration. Our results show that an agent trained via automated curriculum learning can outperform state-of- the-art deep reinforcement learning baselines and match the performance of the human expert in a simulated command and control task in <i>StarCraft II</i> modeled over a real military scenario.
- Conference Article
7
- 10.1109/icoias56028.2022.9931272
- Sep 23, 2022
This work aims to propose the use of deep reinforcement learning for mobile robot navigation and obstacle avoidance in previously unknown areas or without pre-made maps with continuous action control to increase the capabilities of mobile robots beyond conventional map-based navigation. Deep reinforcement learning is used to enable the robot to learn how to make decisions and interact with the environment observed from sensor data to safely navigate itself to its destination. The robot has a two-dimensional laser scanner, ultrasonic sensors and odometry sensor. Deep Deterministic Policy Gradient, which can function in continuous action space, was chosen as the deep reinforcement learning model. The robot is trained and tested in a Gazebo simulator with Robot Operating System. After the training process, the robot is put to the challenge to complete a waypoint navigation mission in four unknown areas as part of an assessment. The results indicate that the mobile robot is adaptable and has the capability of traveling to the specified waypoints and completing the job without the need for a pre-drawn route or an obstacle map in unidentified environments with the minimum success rate of 69.7 percent.
- Research Article
6
- 10.1109/access.2021.3118109
- Jan 1, 2021
- IEEE Access
This paper presents an automatic curriculum learning (ACL) method for object transportation based on deep reinforcement learning (DRL). Previous studies on object transportation using DRL have a sparse reward problem that an agent receives a rare reward for only the transportation completion of an object. Generally, curriculum learning (CL) has been used to solve the sparse reward problem. However, the conventional CL methods should be manually designed by users, which is difficult and tedious work. Moreover, there were no standard CL methods for object transportation. Therefore, we propose an ACL method for object transportation in which human intervention is unnecessary at the training step. A robot automatically designs curricula itself and iteratively trains according to the curricula. First, we define the difficult level of object transportation using a map, which is determined by the predicted travelling distance of an object and the existence of obstacles and walls. In the beginning, a robot learns the object transportation at an easy level (i.e., travelling distance is short and there are less obstacles around), then learns a difficult task (i.e., the long travelling distance of an object is required and there are many obstacles around). Second, training time also affects the performance of object transportation, and thus, we suggest an adaptive determining method of the number of training episodes. The number of episodes for training is adaptively determined based on the current success rate of object transportation. We verified the proposed method in simulation environments, and the success rate of the proposed method was 14% higher than no-curriculum. Also, the proposed method showed 63% (maximum) and 14% (minimum) higher success rates compared with the manual curriculum methods. Additionally, we conducted real experiments to verify the gap between simulation and practical results.
- Conference Article
5
- 10.1109/southeastcon44009.2020.9249654
- Mar 28, 2020
A study is presented on applying deep reinforcement learning (DRL) for visual navigation of wheeled mobile robots (WMR) in dynamic and unknown environments. Two DRL algorithms, namely, value-learning deep Q-network (DQN) and policy gradient based asynchronous advantage actor critic ( $A$ 3C), have been considered. RGB (red, green and blue) and depth images have been used as inputs in implementation of both DRL algorithms to generate control commands for autonomous navigation of WMR in simulation environments. The initial DRL networks were generated and trained progressively in OpenAI Gym Gazebo based simulation environments within robot operating system (ROS) framework for a popular target WMR, Kobuki TurtleBot2. A pre-trained deep neural network ResNet50 was used after further training with regrouped objects commonly found in laboratory setting for target-driven mapless visual navigation of Turlebot2 through DRL. The performance of $A$ 3C with multiple computation threads (4, 6, and 8) was simulated on a desktop. The navigation performance of DQN and $A$ 3C networks, in terms of reward statistics and completion time, was compared in three simulation environments. As expected, $A$ 3C with multiple threads (4, 6, and 8) performed better than DQN and the performance of $A$ 3C improved with number of threads. Details of the methodology, simulation results are presented and recommendations for future work towards real-time implementation through transfer learning of the DRL models are outlined.
- New
- Research Article
- 10.1016/s2215-0986(25)00272-1
- Nov 1, 2025
- Engineering Science and Technology, an International Journal
- New
- Research Article
- 10.1016/j.jestch.2025.102221
- Nov 1, 2025
- Engineering Science and Technology, an International Journal
- New
- Research Article
- 10.1016/j.jestch.2025.102185
- Nov 1, 2025
- Engineering Science and Technology, an International Journal
- New
- Research Article
- 10.1016/j.jestch.2025.102198
- Nov 1, 2025
- Engineering Science and Technology, an International Journal
- New
- Research Article
- 10.1016/j.jestch.2025.102207
- Nov 1, 2025
- Engineering Science and Technology, an International Journal
- New
- Research Article
- 10.1016/j.jestch.2025.102203
- Nov 1, 2025
- Engineering Science and Technology, an International Journal
- New
- Research Article
- 10.1016/j.jestch.2025.102201
- Nov 1, 2025
- Engineering Science and Technology, an International Journal
- New
- Research Article
- 10.1016/j.jestch.2025.102186
- Nov 1, 2025
- Engineering Science and Technology, an International Journal
- Research Article
- 10.1016/j.jestch.2025.102138
- Oct 1, 2025
- Engineering Science and Technology, an International Journal
- Research Article
- 10.1016/j.jestch.2025.102166
- Oct 1, 2025
- Engineering Science and Technology, an International Journal
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.