Abstract

In natural environments, bird sounds are often accompanied by background noise, so denoising becomes crucial to automated bird sound recognition. Recently, thanks to neural network embeddings, the deep clustering method has achieved better performances than traditional denoising methods, like filter-based methods, due to its ability to solve the problem when noise is in the same frequency range as bird sounds. In this paper, we propose a generalized denoising method based on deep clustering, which can process more complex recordings with less distortion. Also, we optimize the original affinity loss function to get a novel loss function to ensure the embedding vectors with the minimum distance belong to the same source, named Joint Center Loss (JCL), which can both increase the inter-class variance and decrease the intra-class variance of embeddings. Experiments are conducted on the gated convolutional neural network architecture and the bidirectional long short term memory architecture respectively with different loss functions. Given the signal-noise ratio being -3dB, the recognition accuracy increases relatively by 9.5% with the proposed denoising method in the best case, and the Relative Root Mean Square Error (RRMSE) increases relatively by 14.2% by using JCL, compared with the original affinity loss function AL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call