This paper proposes a multi–convolutional neural network (CNN)-based system for the detection, tracking, and recognition of the emotions of dogs in surveillance videos. This system detects dogs in each frame of a video, tracks the dogs in the video, and recognizes the dogs’ emotions. The system uses a YOLOv3 model for dog detection. The dogs are tracked in real time with a deep association metric model (DeepDogTrack), which uses a Kalman filter combined with a CNN for processing. Thereafter, the dogs’ emotional behaviors are categorized into three types—angry (or aggressive), happy (or excited), and neutral (or general) behaviors—on the basis of manual judgments made by veterinary experts and custom dog breeders. The system extracts sub-images from videos of dogs, determines whether the images are sufficient to recognize the dogs’ emotions, and uses the long short-term deep features of dog memory networks model (LDFDMN) to identify the dog’s emotions. The dog detection experiments were conducted using two image datasets to verify the model’s effectiveness, and the detection accuracy rates were 97.59% and 94.62%, respectively. Detection errors occurred when the dog’s facial features were obscured, when the dog was of a special breed, when the dog’s body was covered, or when the dog region was incomplete. The dog-tracking experiments were conducted using three video datasets, each containing one or more dogs. The highest tracking accuracy rate (93.02%) was achieved when only one dog was in the video, and the highest tracking rate achieved for a video containing multiple dogs was 86.45%. Tracking errors occurred when the region covered by a dog’s body increased as the dog entered or left the screen, resulting in tracking loss. The dog emotion recognition experiments were conducted using two video datasets. The emotion recognition accuracy rates were 81.73% and 76.02%, respectively. Recognition errors occurred when the background of the image was removed, resulting in the dog region being unclear and the incorrect emotion being recognized. Of the three emotions, anger was the most prominently represented; therefore, the recognition rates for angry emotions were higher than those for happy or neutral emotions. Emotion recognition errors occurred when the dog’s movements were too subtle or too fast, the image was blurred, the shooting angle was suboptimal, or the video resolution was too low. Nevertheless, the current experiments revealed that the proposed system can correctly recognize the emotions of dogs in videos. The accuracy of the proposed system can be dramatically increased by using more images and videos for training the detection, tracking, and emotional recognition models. The system can then be applied in real-world situations to assist in the early identification of dogs that may exhibit aggressive behavior.
Read full abstract