Abstract

Underwater Acoustic Target Recognition (UATR) remains one of the most challenging tasks in underwater signal processing due to the lack of labeled data acquisition, the impact of the time-space varying intrinsic characteristics, and the interference from other noise sources. Although some deep learning methods have been proven to achieve state-of-the-art accuracy, the accuracy of the recognition task can be improved by designing a Residual Network and optimizing feature extraction. To give a more comprehensive representation of the underwater acoustic signal, we first propose the three-dimensional fusion features along with the data augment strategy of SpecAugment. Afterward, an 18-layer Residual Network (ResNet18), which contains the center loss function with the embedding layer, is designed to train the aggregated features with an adaptable learning rate. The recognition experiments are conducted on the ship-radiated noise dataset from a real environment, and the accuracy results of 94.3% indicate that the proposed method is appropriate for underwater acoustic recognition problems and sufficiently surpasses other classification methods.

Highlights

  • Background noiseFishing boats, trawlers, mussel boats, tugboats, and the dredgerMotorboats, pilot boats, and sailboatsPassenger ferriesOcean liners and ro-ro vesselsBefore preprocessing, the number of the recorded sound clips with a duration of 5 s is 1956 by truncating the original records

  • We propose the three-dimensional fusion features along with the data augment strategy of SpecAugment and an 18-layer Residual Network (ResNet18) containing the center loss function with the embedding layer to achieve good accuracy

  • With convolutional neural network (CNN)-2 fed by the optimized feature of Log Mel (LM) + Mel Frequency Cepstral Coefficients (MFCC) + CCTZ, we can achieve the average accuracy of 0.906, which surpasses that of CNN-1 of 0.845

Read more

Summary

Introduction

As a key technology to promote the intelligence of the underwater acoustic equipment system, underwater acoustic target-radiated noise recognition is one of the most important research directions of underwater acoustic signal processing [1]. Of Mel Frequency Cepstral Coefficients (MFCC) and Log-Mel Spectrogram (LM) are two widely used features in Environment Sound Classification (ESC) tasks [9,10] with acceptable performance Such features originate from the speech or sound field, the effect of MFCC and its first-order differential MFCC or second-order MFCC features are proven for underwater acoustic target recognition [8]. Li et al [5] introduce a feature optimization approach with Deep Neural Networks (DNN) and optimizing loss function and achieve an accuracy of 84%. We propose the three-dimensional fusion features along with the data augment strategy of SpecAugment and an 18-layer Residual Network (ResNet18) containing the center loss function with the embedding layer to achieve good accuracy.

Description of the Classification Method
Preprocessing
Feature
Structure of the ResNet18
The Embedding Layer with the Center Loss Function and Softmax
Dataset Description and Preparation
Background noise
Experimental Result
Experimental
Experiment B
Experiment C
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call