Underwater acoustic sensor networks (UASNs) have been used for various scenarios, such as marine exploration, and development of marine resources. However, due to the challenging underwater environment, UASNs suffers from high propagation delay, low transmission reliability and energy limitation, which pose significant obstacles to the delivery efficiency of data packets. What’s more, the sparse topology, the failure of node or link, and other factors can cause void holes, resulting in the considerable package retransmission and the low network reliability of UASNs. To this end, this paper proposed a reinforcement-learning-based opportunistic routing protocol (ROEVA) to reduce energy consumption as well as to improve the transmission reliability which also addresses the issue of void routing in underwater acoustic sensor networks. To seek optimal routing rules, a reward function based on reinforcement learning is proposed, where factors such as energy, delay, link quality, and depth information are all taken into account for appropriate routing decisions. Before forwarding the data packet, a two-hop availability checking function is defined, which can identify trap nodes and avoid routing holes. In addition, aiming to reduce packet redundancy and collisions, a waiting mechanism derived from opportunistic routing is proposed. According to the calculated Q-values, the waiting mechanism determines the priority list for packet forwarding. Evaluation results demonstrate that the proposed ROEVA protocol outperforms HHVBF, RCAR, QELAR and GEDAR in terms of energy efficiency, packet delivery ratio (PDR), average hop count, and end-to-end delay, which are analyzed by varying the number of nodes from 100 to 500.