Abstract

In the wild, bird vocalizations of the same species across different populations may be different (e.g., so called dialect). Besides, the number of species is unknown in advance. These two facts make the task of bird species recognition based on vocalization a challenging one. This study treats this task as an open set recognition (OSR) cross-corpus scenario. We propose Instance Frequency Normalization (IFN) to remove instance-specific differences across different corpora. Furthermore, an x-vector feature extraction model integrated Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM) are designed to better capture sequence information. Finally, the threshold-based Probabilistic Linear Discriminant Analysis (PLDA) is introduced to discriminate the extracted x-vector features to discover the unknown classes. When compared to the best results of the existing method, the average ACCs for the single-corpus and cross-corpus experiments are improved, implying that our method can provide a potential solution and improve performance for cross-corpus bird species recognition based on vocalization in open set condition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call