Underwater Wireless Sensor Networks (UWSNs) face unique constraints due to their unstructured and dynamic underwater environment. Data gathering from these networks is crucial as energy resources are limited. In this regard, efficient routing protocols are needed to optimize energy consumption, increase the network lifetime, and enhance data delivery in these networks. In this work, we develop an Adaptive Distributed Routing Protocol for UWSNs using Deep Q-Learning (ADRP-DQL). This protocol employs the ability of reinforcement learning to dynamically learn the best routing decisions based on the network’s state and action-value estimates. It allows nodes to make intelligent routing decisions, considering residual energy, depth and node degree. A Deep Q-Network (DQN) is employed as the function approximator to estimate action values and choose the optimal routing decisions. The DQN is trained using off-policy and on-policy strategies and the neural network model. Simulation results demonstrate that ADRP-DQL performs well regarding energy efficiency (EE), data delivery ratio, and network lifetime. The results highlight the proposed protocol’s effectiveness and adaptability to UWSNs. The ADRP-DQL protocol contributes to intelligent routing for UWSNs, offering a promising approach to enhance performance and optimize energy utilization in these demanding environments.