Dog barks are a crucial means for dogs to express their emotions, conveying needs and emotional states such as anger, sadness, or excitement. With the increasing number of pet dogs, understanding their emotional states is essential for better care and interaction. However, accurately interpreting the emotional content of dog barks is often challenging for humans. Therefore, developing a technology for recognizing the emotions in dog barks is of great importance. In this paper, we establish a dataset for dog bark emotion recognition and propose a method using the Synchrosqueezing Short-Time Fourier Transform (SST_STFT) to extract the time–frequency features of dog barks. Considering the characteristics of dog barks, we design an emotion recognition model based on Mamba. This model leverages the state-space model (Select SMM) for global modeling of long sequence features, enabling rapid and effective processing of time–frequency features and accurate perception of their variations. Experiments on the self-built DogEmotionSound dataset, as well as the public IEMOCAP and UrbanSound8K datasets, demonstrate that our model outperforms existing state-of-the-art sound recognition models across various evaluation metrics. The recognition accuracies on the three datasets are 91.97%, 95.36%, and 67.25%, respectively. The code is open-source at https://github.com/yangcjya/A-barking-emotion-recognition-method-based-on-Mamba.git.
Read full abstract