The traditional Deep Deterministic Policy Gradient (DDPG) algorithm frequently exhibits a notable reduction in success rate when transferred to new environments after being trained in complex simulation settings. To address these issues, this paper adopts a Multi-Environment (Multi-Env) parallel training approach and integrates Multi-Head Attention (MHA) and Prioritized Experience Replay (PER) into the DDPG framework, optimizing the reward function to form the MAP-DDPG algorithm. This approach enhances the algorithm’s generalization capability and execution efficiency. Through comparative training and testing of the DDPG and MAP-DDPG algorithms in both simulation and real-world environments, the experimental results demonstrate that MAP-DDPG significantly improves generalization and execution efficiency over the DDPG algorithm. Specifically, in simulation environment tests, the MAP-DDPG algorithm achieved an average 30% increase in success rate and reduced the average time to reach the target point by 23.7 s compared to the DDPG algorithm. These results indicate that the MAP-DDPG algorithm significantly enhances path planning generalization and execution efficiency, providing a more effective solution for path planning in complex environments.