In this paper, we address the problem of real-time navigation and obstacle avoidance for automated guided vehicles (AGVs) in dynamic environments, which is a primary research area in collaborative control systems for AGVs. To overcome the computational inefficiency of recalculating optimal paths every time, we propose an improved Soft Actor–Critic (SAC)-based reinforcement learning methodology. This methodology utilizes a novel composite auxiliary reward structure and sum-tree prioritized experience replay (SAC-SP) to achieve real-time optimal feedback control. First, we formulate the navigation task as a Markov Decision Process that considers both static and dynamic obstacles. To accelerate the active learning of AGVs, we propose a novel strategy that uses composite auxiliary rewards. Next, we train the AGVs using the proposed SAC-SP methodology to handle real-time navigation with the composite auxiliary reward structure. The well-trained policy network can generate effective on-board optimal feedback actions given obstacle positions, targets, and AGV states. Simulation experiments demonstrate that our proposed method can steer AGVs to the destination with high robustness to original conditions and various obstacle restrictions, generating optimal feedback actions in the shortest amount of time.
Read full abstract