Differential treatment for time and frequency dimensions in mel-spectrograms: An efficient 3D Spectrogram network for underwater acoustic target classification

Ning Tang,Fei Zhou,Yan Wang,Hao Zhang,Tingting Lyu,Zhen Wang,Lin Chang

doi:10.1016/j.oceaneng.2023.115863

Abstract

Underwater acoustic target classification (UATC) has traditionally relied on time–frequency (TF) analysis, with the mel-spectrogram being widely used due to its 2D image-like format and size efficiency. However, many previous UATC approaches have not adequately acknowledged the distinction between natural images and mel-spectrograms. In this paper, we propose a novel approach to transform mel-spectrograms into 3D data and introduce an efficient 3D Spectrogram Network (3DSNet) that treats the time and frequency dimensions separately. Our 3DSNet consists of three key components: the Time–Frequency Separate Convolution (TFSConv) module, Asymmetric Pooling (AsyPool) module, and Channel-Time Attention (CTA) module. We introduce the TFSConv module, which utilizes two convolution operators for the time and frequency dimensions to extract 3D T-F features. This module serves as a lightweight approximation of 3D convolution, effectively reducing approximately 85 percent of parameters. To maintain the inherent distinction between the frequency and time dimensions, we propose an AsyPool module that employs two downsampling strategies. Additionally, we introduce a CTA module to capture more informative and meaningful T-F features. We evaluate our 3DSNet on two publicly available underwater acoustic datasets, and the results demonstrate that our method achieves the optimal balance between performance and model parameters compared to other mainstream methods.

Full Text