In this work, we consider a point-to-point underwater optical wireless communication scenario where an underwater sensor (US) transmits its sensing data to a remotely operated vehicle (ROV). Before the US transmits its data to the ROV, the ROV performs simultaneous lightwave information and power transfer (SLIPT), delivering both control data and lightwave power to the US. Under the considered scenario, our objective is to maximize energy harvesting at the US while supporting predetermined communication performance between the two nodes. To achieve this objective, we develop a hierarchical deep Q-network (DQN)–deep deterministic policy gradient (DDPG)-based online algorithm. This algorithm involves two reinforcement learning agents: the ROV and US. The role of the ROV agent is to determine an optimal beam-divergence angle that maximizes the received optical signal power at the US while ensuring a seamless optical link. Meanwhile, the US agent, which is influenced by the decision of the ROV agent, is responsible for determining the time-switching and power-splitting ratios to maximize energy harvesting without compromising the required communication performance. Unlike existing studies that do not account for adaptive parameter control in underwater SLIPT, the proposed algorithm’s adaptive nature allows for the dynamic fine-tuning of optimization parameters in response to varying underwater environmental conditions and diverse user requirements.
Read full abstract