Repurposing transfer learning strategy of computer vision for owl sound classification

Bens Pardamean,Alam Ahmad Hidayat,Tjeng Wawan Cenggoro,Kevin William Gunawan

doi:10.1016/j.procs.2022.12.154

Bens Pardamean, Alam Ahmad Hidayat + Show 2 more

Open Access

https://doi.org/10.1016/j.procs.2022.12.154

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2023
Citations: 4	License type: cc-by-nc-nd

Affiliation: Binus University

Abstract

Accurately predicting an owl species based on its sound can be helpful for owl conservation. To build an accurate model for owl sound classification, deep learning is currently the most preferred algorithm, due to its excellent performance for modeling audio data. However, deep learning is generally underperformed for a small dataset, which is the case for recognizing scops owl sound. To overcome the issue, we proposed a transfer learning strategy, which is common for computer vision tasks, that can alleviate overfitting in a deep learning model for the owl sound classification. In our approach, we propose a neural network architecture consisting of the backbone of a EfficientNet model pre-trained on the massive ImageNet database. The model takes the sound input that has been converted as two image representations: Spectrogram and Mel Frequency Cepstral Coefficients. Our strategy enables the use of a relatively small size of pre-trained image classification model, which is widely available, for transfer learning in owl sound classification. Deploying the lightweight model in an automatic sound classifier provides a fast and accurate tool for various owl conservation purposes.

Full Text