Acoustic surveys play a pivotal role in fisheries management. During the surveys, acoustic signals are sent into the water and the strength of the reflection, so-called backscatter, is recorded. The collected data are typically annotated manually, a process that is both labor-intensive and time-consuming, to support acoustic target classification (ATC). The primary objective of this study is to develop an annotation-free deep learning model that extracts acoustic features and improves the representation of acoustic data. For this purpose, we adopt a self-supervised method inspired by the Self DIstillation with NO Labels (DINO) model. Extracting useful acoustic features is an intricate task due to the inherent variability and complexity in biological targets, as well as environmental and technical factors influencing sound interactions. The proposed model is trained with three sampling methods: random sampling, which ignores class imbalance present in the acoustic survey data; class-balanced sampling, which ensures equal representation of known categories; and intensity-based sampling, which selects data to capture backscatter variations. The quality of extracted features is then evaluated and compared. We show that the extracted features lead to improvement, in comparison to using the untreated data, in the discriminative power of several machine learning methods (k-nearest neighbor (kNN), linear regression, multinomial logistic regression) for ATC. The improvement was measured through higher accuracy in kNN (77.55% vs. 71.93%), Macro AUC in logistic regression (0.92 vs. 0.80), and R2 in linear regression (0.69 vs. 0.45) when comparing extracted features to the untreated data. Our findings highlight the advantage of applying emerging self-supervised techniques in fisheries acoustics. This study thus contributes to the ongoing efforts to improve the efficiency of acoustic surveys in fisheries management.
Read full abstract