Model-based optimal action selection for Dyna-Q reverberation suppression cognitive sonar

Yubin Fu,Xiaochuan Ma,Chao Feng,Xingxuan Pei,Pengzhuo Li

doi:10.1186/s13634-023-01054-7

Abstract

The Doppler shift of low-speed targets is frequently disturbed by the reverberation Doppler spread clutter under the shallow sea. The clutter is generated by underwater scatterers, which increases the difficulty of Doppler estimation. To solve this problem, a reverberation target resolution function based on the Doppler spread clutter statistical model is proposed in this paper. Through the width of reverberation Doppler clutter, this function adjusts the waveform parameters by determining whether the target is discriminable. In addition, the reverberation Doppler spread clutter is time-spatial varying and affected by grazing angle, waves, wind speed, fish and other effects. Thus, the sonar waveform parameters need to be adjusted constantly. Therefore, this paper combines the cognitive sonar based on reinforcement learning with the reverberation target resolution function to evaluate different waveforms in different environments. Consequently, the sonar can adjust the waveform parameters in real-time and obtain the optimal waveform in different environments. Meanwhile, in this paper, the action selection strategy of Dyna-Q reinforcement learning is optimized, and the model-based maximum action selection Dyna-Q algorithm (Dyna-Q-Max-Action) is proposed. Compared with the traditional Dyna-Q and Q-learning algorithms, the proposed algorithm needs fewer episodes. Finally, numerical simulation verified the effectiveness of the proposed algorithm.

Full Text