Abstract

Bioacoustic event detection is a task of identifying specific animal sounds within a biological audio recordings. Specialists only can label temporal information of animal sounds since knowledge for biological sound is demanded for labeling and they can give only few annotations due to high labor intensity caused by large duration of biological recordings. Therefore, the task is set to few-shot learning scheme and method called prototypical network effectively deal with the few-shot learning scheme by learning representation space which represent each class for given few examples. In this work, we utilized spectro-temporal receptive field, which is inspired auditory cortex which responses actively to certain spectro-temporal modulation, as a convolutional layer kernel of prototypical network. Bioacoustic events retain plentiful spectro-temporal modulation that STRF is expected to capture animal sounds effectively. Also, STRF kernels are constructed to fixed shape rather than trained that a model utilizing STRF kernels would learn representation space with less parameter. We built a model called Two Branch STRFNet (TB-STRFNet), in which STRF branch captures spectro-temporal modulation by STRF kernels and the other branch captures detailed time-frequency information which can be smashed in STRF branch. TB-STRFNet outperformed other models showing that effectiveness of auditory system inspired method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.