Reinforcement learning (RL) has been successfully applied to underwater routing protocols due to its powerful ability of distributed decision making. However, the traditional RL has slow convergence speed and low learning efficiency in underwater. Meanwhile, too many studies focus on using RL to find low hop paths rather than short distance paths in underwater routing, while the long distance of ocean communication is the significant reason for the packets collision and energy loss in underwater. Based on the above problems, this paper proposes the PDDQN-HHVBF (Empirical Priority DDQN to Improve Hop-by-Hop Vector-Based Forwarding) protocol for M-UWSNs (Mobile source node Underwater Wireless Sensor Networks), in which AUV (Autonomous Underwater Vehicle) is used as source node to collect data and transmit data hop-by-hop to Sink node through underwater nodes. The proposed protocol is adopt to find the optimal relay nodes in pipeline referred HHVBF protocol by requesting the max Q value according to three states of the residual energy of nodes, the number of candidate relay nodes and the geographical location information of all candidate relay nodes in time. This because PDDQN-HHVBF avoids the strong correlation between data samples, and its playback samples will not be too concentrated or lead to over fitting. It can converge rapidly in underwater environment. In addition, the requesting Q value mechanism related to the geographical location information can find the optimal relay node with short distance propagation in large-scale networks, which will reduce the number of packets collision, and then saving energy and improving network lifetime. In addtion, the in-time requesting for Q value can cope with the nodes drift affected by ocean current movement. In addition, the Q value related to the residual energy of nodes and the number of candidate relay nodes will effectively load balancing nodes, prolong network lifetime and avoid routing holes. Finally, the “Store-Carry-Forward” mechanism proposed for AUV, this mechanism store and carry packets when facing routing holes until find the optimal relay node for forwarding, which will improve PDR and save energy of AUV significantly. The simulation results show that, the proposed PDDQN-HHVBF protocol converges about 30% faster than DQELR. Although its delay is higher than DQELR and ROEVA for requesting Q value. It outperforms VBF, HHVBF, DQELR, and ROEVA in terms of energy efficency, PDR, and lifetime, which are analyzed by varying speed of nodes from 0 m/s to 3 m/s with 1000 nodes and varying number of nodes from 500 to 3000 with speed in 1 m/s.