Abstract
There is a close biological relationship between cats and humans. Emotion recognition of domestic cat meows plays an essential role in human–animal relationships, animal–environment relationships, and animal conservation. This study proposed a deep learning network named JL-TFMSFNet. It jointly learns the time–frequency domain information and multi-scale features to recognize the sound emotions of domestic cats. Firstly, the Mel feature of the domestic cat’s sound was extracted based on the Mel filter, using the proposed Multi-scale Feature Extraction Module (MFEM) to extract weighting information to generate a deeper one. Secondly, the semantic information in the spectrogram was learned from the time–frequency domain using the Time–Frequency Attention Mechanism (TFAM) proposed in the backbone network. Then, the Diverse Branch Block (DBB) module with the different receptive fields was added to improve the recognition performance. Finally, validation experiments were conducted on the self-built dataset Cat_Emotion_Sound (CES) nd the public dataset Urbansound8K. The experiments demonstrated that JL-TFMSFNet outperformed the state-of-the-art sound classification models. The average accuracies of the proposed model in this paper on these two datasets are 94.43% and 96.14%, and the F1-score are 94.01% and 95.65%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.