Abstract

Birds play a pivotal role in ecosystem and biodiversity research, and accurate bird identification contributes to the monitoring of biodiversity, understanding of ecosystem functionality, and development of effective conservation strategies. Current methods for bird sound recognition often involve processing bird songs into various acoustic features or fusion features for identification, which can result in information loss and complicate the recognition process. At the same time, the recognition method based on raw bird audio has not received widespread attention. Therefore, this study proposes a bird sound recognition method that utilizes multiple one-dimensional convolutional neural networks to directly learn feature representations from raw audio data, simplifying the feature extraction process. We also apply positional embedding convolution and multiple Transformer modules to enhance feature processing and improve accuracy. Additionally, we introduce a trainable weight array to control the importance of each Transformer module for better generalization of the model. Experimental results demonstrate our model’s effectiveness, with an accuracy rate of 99.58% for the public dataset Birds_data, as well as 98.77% for the Birdsonund1 dataset, and 99.03% for the UrbanSound8K environment sound dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.