AI-Driven Real-Time UAV Autonomous Trajectory Optimization Using Deep Reinforcement Learning in Dynamic and Partially Observable Environments

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Autonomous Unmanned Aerial Vehicle (UAV) navigation in dynamic and partially observable environments poses significant challenges, including real-time decision-making and robust obstacle avoidance. Traditional methods often struggle with adaptability, necessitating more advanced approaches. In this work, we propose a Deep Reinforcement Learning (DRL) framework for trajectory optimization, leveraging the Advantage Actor–Critic (A2C) algorithm. We further enhance stability, learning speed, and generalization by employing automatic hyperparameter tuning with Optuna. The proposed system is validated in a Software-in-the-Loop (SITL) simulation using AirSim, ensuring realistic flight dynamics and sensor feedback. Multi-modal observations — combining depth images, GPS, and target localization — improve situational awareness in partially observable conditions. Experimental results show that A2C tuned with Optuna boosts trajectory efficiency by 35.7%, reduces the collision rate to 0.97% and achieves a 74% success rate, while cutting training time by 42%. These findings confirm the effectiveness of using automated hyperparameter tuning for UAV motion planning and pave the way for real-world deployment of DRL-based UAV control systems. Furthermore, our study provides an in-depth comparison of training efficiency, convergence properties, and robustness across different algorithms, establishing a strong foundation for autonomous UAV navigation in challenging environments.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 64
  • 10.3390/electronics13132432
Autonomous UAV Navigation with Adaptive Control Based on Deep Reinforcement Learning
  • Jun 21, 2024
  • Electronics
  • Yongfeng Yin + 4 more

Unmanned aerial vehicle (UAV) navigation plays a crucial role in its ability to perform autonomous missions in complex environments. Most of the existing reinforcement learning methods to solve the UAV navigation problem fix the flight altitude and velocity, which largely reduces the difficulty of the algorithm. But the methods without adaptive control are not suitable in low-altitude environments with complex situations, generally suffering from weak obstacle avoidance. Some UAV navigation studies with adaptive flight only have weak obstacle avoidance capabilities. To address the problem of UAV navigation in low-altitude environments, we construct autonomous UAV navigation in 3D environments with adaptive control as a Markov decision process and propose a deep reinforcement learning algorithm. To solve the problem of weak obstacle avoidance, we creatively propose the guide attention method to make a UAV’s decision focus shift between the navigation task and obstacle avoidance task according to changes in the obstacle. We raise a novel velocity-constrained loss function and add it to the original actor loss to improve the UAV’s velocity control capability. Simulation experiment results demonstrate that our algorithm outperforms some of the state-of-the-art deep reinforcement learning algorithms performing UAV navigation tasks in a 3D environment and has outstanding performance in algorithm effectiveness, with the average reward increasing by 9.35, the success rate of navigation tasks increasing by 14%, and the collision rate decreasing by 14%.

  • Conference Article
  • Cite Count Icon 12
  • 10.1109/ccdc.2019.8832593
Vision-based Navigation of UAV with Continuous Action Space Using Deep Reinforcement Learning
  • Jun 1, 2019
  • Benchun Zhou + 3 more

Autonomous navigation of unmanned aerial vehicles (UAV) in an unknown environment is a challenging task that attracts many researchers. There exist many solutions to the problem, one promising method is deep reinforcement learning. In this paper, we investigate a deterministic policy based actor-critic learning framework on vision-based navigation of an autonomous UAV within a simulated environment. In particular, navigation with high-level input and continuous output is considered. In simulation, the observation of UAV is a depth image from front camera while the target information is a vector, to combine them together, we calculate the track angle between UAV and target, and encode it into a depth image together observation, making up to state representation. In the framework of our algorithm, actor network adopts convolutional layer to deal with high-level input, while critic network employs merge layer to balance state information and action information. The result of the experiment supports the idea of full control of an autonomous UAV through deep reinforcement learning as we solve the task successfully. Besides, comparison with other method was conducted to further explore the advantage of the method.

  • Research Article
  • Cite Count Icon 160
  • 10.1016/j.cja.2020.05.011
UAV navigation in high dynamic environments: A deep reinforcement learning approach
  • Jun 17, 2020
  • Chinese Journal of Aeronautics
  • Tong Guo + 5 more

UAV navigation in high dynamic environments: A deep reinforcement learning approach

  • Research Article
  • Cite Count Icon 2
  • 10.1109/access.2025.3531931
Autonomous Real-Time Smoothness Control for Reliable DDQN-Based UAV Navigation Using Cellular Networks
  • Jan 1, 2025
  • IEEE Access
  • Ghada Afifi + 1 more

Reliable Unmanned Aerial Vehicle (UAV) navigation in urban environments is a crucial prerequisite for major civilian and military applications. Many existing Global Positioning System (GPS)-based UAV navigation solutions do not meet the performance requirements given their unreliability in urban environments. In this paper, we present a smooth trajectory planning approach to generate reliable UAV trajectories with less chatter and sharp turns. We propose to utilize broadcast signals from existing cellular networks to practically navigate the UAV from a given source to a destination in urban environments independent of GPS or other transmissible signals. For this purpose, we formulate the smooth trajectory planning problem as an optimization problem to provide a probabilistic guarantee on the success of the UAV mission considering the UAV dynamic and kinematic constraints. We utilize proper optimization-based techniques to determine the optimal bound of the solution for benchmarking purposes. Next, we propose a machine learning based approach to provide a practical real-time solution to the formulated UAV navigation problem. Finally, we present an in-depth comparative analysis to evaluate the performance of the proposed double deep Q-network (DDQN)-based technique as compared to other solutions from the literature.

  • Research Article
  • Cite Count Icon 7
  • 10.1080/08839514.2022.2084473
A Novel Augmentative Backward Reward Function with Deep Reinforcement Learning for Autonomous UAV Navigation
  • Jul 6, 2022
  • Applied Artificial Intelligence
  • Manit Chansuparp + 1 more

The autonomous UAV (unmanned aerial vehicle) navigation has recently gained an increasing interest from both academic and industrial sectors due to its potential uses in various fields and especially, the need for social distancing during the pandemic. Many works have adopted a deep reinforcement learning (RL) method with experience replay called deep deterministic policy gradient (DDPG) to control the motion of UAV, and gain high accuracy results in static and simplified environments. However, they are still far from being ready for real world adoption in that the UAVs have to operate under complex and dynamic conditions. We also found that using only DDPG makes the learning process prone to oscillation and is inefficient for tasks having high dimensional action-state spaces. Furthermore, the goal reward mechanism in traditional reward functions brings a bias to the state, which resembles the one at the goal area and leads to erroneous action selection. To get closer to being ready for real world adoption, we proposed a novel method that enables UAVs to be capable of handling motion control in realistic environments. The first component of our proposed method is point cloud data (PCD) simplification with truncated icosahedron structure which converts enormous PCD into a few essential data points. In the second component of our method, we replace the traditional goal reward mechanism with a new mechanism called Augmentative Backward Reward (ABR) function to dispense the goal reward to transitions proportionately to its participation. By integrating simplified PCD and ABR, we achieved significantly better results when compared with using only the-state-of-the-art, TD3. In addition, we tested the proposed method with another navigation task, BipedalWalkerHardcore, a testbed for RL, and the result is still better and steadier than of TD3. These results indicate that the proposed method is robust.

  • Research Article
  • Cite Count Icon 1
  • 10.15294/rji.v3i1.4430
QR-Code Based Visual Servoing and Target Recognition to Improve Payload Release Accuracy in Air Delivery Missions using Fully Autonomous Quad-Copter UAV
  • Mar 28, 2025
  • Recursive Journal of Informatics
  • Bondan Eka Nugraha + 1 more

Abstract. Unmanned Aerial Vehicles (UAVs) are increasingly utilized for package delivery due to their efficiency and automation capabilities. UAVs can execute autonomous flight missions using Global Positioning System (GPS)-based navigation. However, challenges arise in the final stage of delivery, known as the last-mile delivery problem. The limitations of GPS-based navigation, the absence of recipient authentication, and shifting drop-off points create reliability and safety concerns. External factors such as varied environmental topography further contribute to delivery inaccuracies, highlighting the need for a more precise approach. Purpose: Many studies have explored UAV navigation and delivery systems, but challenges in last-mile delivery remain unresolved. This research introduces an improved UAV delivery system using computer vision (CV) and image-based visual servoing (IBVS) with QR Codes as location markers. The aim is to enhance UAV navigation accuracy and recipient verification, ensuring more reliable package deliveries. Methods/Study design/approach: The study implements a CV-based navigation system where QR Codes serve as landing markers for UAVs. Image processing is conducted using a companion computer linked to the UAV's flight control system. The IBVS method enables UAVs to adjust their position in real-time, minimizing GPS errors. Recipient verification is performed through QR Code scanning before releasing the package. The system is tested through computer simulations and real flight experiments to assess accuracy and effectiveness. Result/Findings: Experimental results demonstrate that UAVs equipped with the IBVS method can successfully complete package delivery missions with improved accuracy. GPS errors are corrected by aligning the UAV's position with QR Code markers, and recipient authentication is verified before package release. Real-flight tests confirm that this approach significantly enhances UAV delivery reliability compared to conventional GPS-based navigation. Novelty/Originality/Value: This research presents a novel integration of computer vision and UAV navigation for addressing last-mile delivery challenges. By leveraging IBVS and QR Code-based authentication, UAVs can perform fully autonomous, precise, and secure package deliveries. This method offers a viable solution to improve UAV-based logistics, reducing delivery errors and enhancing operational safety.

  • Research Article
  • Cite Count Icon 1
  • 10.1002/ett.70111
Flight Evolution: Decoding Autonomous UAV Navigation—Fundamentals, Taxonomy, and Challenges
  • Mar 19, 2025
  • Transactions on Emerging Telecommunications Technologies
  • Geeta Sharma + 1 more

ABSTRACTDue to the adaptability and effectiveness of autonomous unmanned aerial vehicles (UAVs) in completing challenging tasks, research on UAVs has increased quickly during the past few years. An autonomous UAV refers to drone navigation in an unknown environment with minimal human interaction. However, when used in a dynamic environment, UAVs confront numerous difficulties including scene mapping and localization, object recognition and avoidance, path planning, emergency landing, and so forth. Real‐time UAVs demand quick responses to situations; as a result, this is a crucial feature that requires further research. This article presents different novel taxonomies to briefly explain UAVs and the communication architecture utilized during the communication of UAVs with ground stations. Popular databases for UAVs, and the fundamentals of autonomous navigation including the latest ongoing object detection and avoidance methods, path planning techniques, and trajectory mechanisms are also explained. Later, we cover the benchmark dataset available and the different kinds of simulators used in UAVs. Furthermore, several research challenges are covered. From the literature, it has been found that algorithms based on deep reinforcement learning (DRL) are employed more frequently than other intelligence algorithms in the field of UAV navigation. To the best of our knowledge, this is the first article that covers different aspects related to UAV navigation.

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/dessert50317.2020.9124998
The Algorithm of UAV Automatic Landing System Using Computer Vision
  • May 1, 2020
  • Kostiantyn Dergachov + 2 more

This paper discusses the algorithms of vision systems for automatic UAV (Unmanned Aerial Vehicle) landing. The basic algorithms for finding helipad and adapting the UAV control system to an autonomous UAV landing are presented. The authors consider algorithms to solve navigation problems, building block diagrams of automatic UAV control. Also, the developed algorithms make it possible to implement an automatic landing system for UAVs using technical vision systems, using various camera parameters, algorithms allowing research, as part of the autonomous navigation of unmanned aerial vehicles. The article discusses the developed algorithm that allows you to highlight certain characteristic points for visual navigation. Also, a search system for characteristic points has been developed that allows automatic UAV landing. These algorithms may be useful for the autonomous landing UAV systems. Algorithms can also be used to track a trajectory of the UAV system.

  • Book Chapter
  • 10.1201/9781003663461-9
Autonomous Combat Drones and UAV Navigation Using Deep Reinforcement Learning for Target Engagement and Mission Execution
  • Mar 19, 2026
  • Suveg Moudgil + 3 more

Autonomous combat drones and Unmanned Aerial Vehicles (UAVs) are now integral to modern warfare, enabling precise strikes, high-speed reconnaissance, and real-time decision-making with minimal human support. Traditional UAV control systems are rigid and vulnerable in dynamic, GPS-denied, or hostile environments because they often depend on pre-programmed flight commands, GPS navigation, and operator commands. Their reactivity, flexibility, and ability to independently complete difficult missions are all affected by these limitations. We propose the Combat-Ready UAV Intelligence System (CR-UIS), a Deep Reinforcement Learning (DRL)-based system that enables fully autonomous drone engagement in combat scenarios, threat analysis, and navigation. Twin Delayed DDPG (TD3) and Advantage Actor-Critic (A2C) algorithms are used together in hybrid architecture of the system to best optimize both discrete mission choices like strike timing, evasion, and targeting as well as continuous flight control. LiDAR, thermal, video streams, and inertial measurement units are only some of the sensors onboard that contribute data to CR-UIS. To ensure maximum mission success and minimal exposure to danger, the DRL agent is continually learning how to adjust its trajectory, altitude, and weapon systems. The UAV gets enhanced situational awareness, collision avoidance, and autonomous target prioritization through extensive training in representative flight conditions and high-fidelity battle simulators. Based on experimental testing, CR-UIS is superior to waypoint navigation programmed by hand (77.9%) and conventional GPS-dependent autopilot systems (84.2%) in a mission success rate of 93.6%. The system proves to be more versatile in hostile environments with quicker target detection and strike capability. The results validate that CR-UIS is a viable solution for the next-generation autonomous combat air vehicle.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/globecom46510.2021.9685830
Learning to Navigate for Secure UAV Communication
  • Dec 1, 2021
  • Xiangyu Zhang + 2 more

In this paper, we investigate the navigation for unmanned aerial vehicle (UAV)s in the secure communication system, where we design the UAV's navigation/trajectory to ensure the Quality of Service (QoS) with the Base Station (BS) in the existence of multiple unknown-location dynamical eavesdroppers and jammers. To this end, we formulate a UAV trajectory optimization problem to minimize its mission completion time with QoS and security constraints. The imperfect information, dynamic communication environment, and non-convexity make the problem intractable. For these reasons, we propose a novel solution approach, namely Model-Assisted Reinforcement Learning (MARL) algorithm, where the communication system model is embedded into Deep Reinforcement Learning (DRL) framework to ensure secure communication and shorten the learning process. Numerical results show that our proposed methodology can safeguard security and find the shortest way to finish the mission.

  • Research Article
  • Cite Count Icon 22
  • 10.1109/tvt.2024.3425755
Integrated Learning-Based Framework for Autonomous Quadrotor UAV Landing on a Collaborative Moving UGV
  • Nov 1, 2024
  • IEEE Transactions on Vehicular Technology
  • Chang Wang + 6 more

Autonomous unmanned aerial vehicle (UAV) landing on a moving unmanned ground vehicle (UGV) remains a challenge as it is difficult for the UAV to track the real-time state of the UGV and adjust its landing policy accordingly. This paper proposes a learning framework for a quadrotor UAV to land on a moving UGV without knowing its motion dynamics. Specifically, the learning framework consists of two main systems: a Landing Vision System (LVS) using deep learning and a Landing Control System (LCS) using deep reinforcement learning. The LVS enables the UAV to recognize and localize the UGV in real time to estimate the relative position and velocity between them. Besides, the location of the UGV is tracked in the field of view of the UAV using consecutive images, alleviating the tracking failure problem. We propose a Memory Consolidated TD3 (MCTD3) algorithm to generate optimal policies to enable precise tracking and landing control of the UAV. In addition, we propose an adaptive COACH (ACOACH) algorithm that allows human intervention in the action space of the UAV to speed up the training process. We demonstrate the effectiveness of the proposed method in both simulation and real-world experiments.

  • Research Article
  • Cite Count Icon 163
  • 10.1109/tvt.2019.2952549
Deep Reinforcement Learning for UAV Navigation Through Massive MIMO Technique
  • Jan 1, 2020
  • IEEE Transactions on Vehicular Technology
  • Hongji Huang + 5 more

Unmanned aerial vehicles (UAVs) technique has been recognized as a promising solution in future wireless connectivity from the sky, and UAV navigation is one of the most significant open research problems, which has attracted wide interest in the research community. However, the current UAV navigation schemes are unable to capture the UAV motion and select the best UAV-ground links in real-time, and these weaknesses overwhelm the UAV navigation performance. To tackle these fundamental limitations, in this paper, we merge the state-of-the-art deep reinforcement learning with the UAV navigation through massive multiple-input-multiple-output (MIMO) technique. To be specific, we carefully design a deep Q-network (DQN) for optimizing the UAV navigation by selecting the optimal policy, and then we propose a learning mechanism for processing the DQN. The DQN is trained so that the agent is capable of making decisions based on the received signal strengths for navigating the UAVs with the aid of the powerful Q-learning. Simulation results are provided to corroborate the superiority of the proposed schemes in terms of the coverage and convergence compared with those of the other schemes.

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.eij.2024.100556
3D path planning for UAV based on A hybrid algorithm of marine predators algorithm with quasi-oppositional learning and differential evolution
  • Dec 1, 2024
  • Egyptian Informatics Journal
  • Binbin Tu + 2 more

3D path planning for UAV based on A hybrid algorithm of marine predators algorithm with quasi-oppositional learning and differential evolution

  • Conference Article
  • Cite Count Icon 5
  • 10.2514/6.2015-0989
Autonomous Wall-Following Based Navigation of Unmanned Aerial Vehicles in Indoor Environments
  • Jan 2, 2015
  • AIAA Infotech @ Aerospace
  • Alireza Nemati + 3 more

This paper presents a wall-tracking approach for navigation of a quadrotor Unmanned Aerial Vehicle (UAV) in indoor environments. Navigation of a UAV in indoor environments is particularly challenging due to unavailability of Global Position System (GPS) data. Furthermore, unavailability of prior knowledge about obstacles makes it hard to carry out any a− priori path planning. In this paper, a method based on wall-tracking is proposed to enable a quadrotor UAV to navigate across an indoor environment. The UAV is equipped with four proximity sensors around it that can give the distance from the nearest obstacle in that linear direction. Hence, the UAV is capable of tracking around the walls and obstacles while keeping a fixed distance from the wall on its left. An autonomous algorithm is developed for this purpose which enables the UAV to detect and turn around corners, slanted walls, and closed areas. The UAV starts from a random position in an environment, moves towards the nearest wall, and starts tracking the wall from there. This approach is very useful for navigating areas that consist of multiple rooms, and the UAV needs to cross a room to enter a second room. Such capabilities in a UAV can find applications in situations which might otherwise be inaccessible or dangerous for humans, such as collapsed buildings, radioactive or HazMat conditions, and small conduits.

  • Research Article
  • Cite Count Icon 21
  • 10.3390/drones8090516
UAV Autonomous Navigation Based on Deep Reinforcement Learning in Highly Dynamic and High-Density Environments
  • Sep 23, 2024
  • Drones
  • Yuanyuan Sheng + 3 more

Autonomous navigation of Unmanned Aerial Vehicles (UAVs) based on deep reinforcement learning (DRL) has made great progress. However, most studies assume relatively simple task scenarios and do not consider the impact of complex task scenarios on UAV flight performance. This paper proposes a DRL-based autonomous navigation algorithm for UAVs, which enables autonomous path planning for UAVs in high-density and highly dynamic environments. This algorithm proposes a state space representation method that contains position information and angle information by analyzing the impact of UAV position changes and angle changes on navigation performance in complex environments. In addition, a dynamic reward function is constructed based on a non-sparse reward function to balance the agent’s conservative behavior and exploratory behavior during the model training process. The results of multiple comparative experiments show that the proposed algorithm not only has the best autonomous navigation performance but also has the optimal flight efficiency in complex environments.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant