Understanding Dog Behavior through Visual and Auditory Sensing Using Machine Learning

Amy Lin,Mark Eastburn

doi:10.47611/jsrhs.v12i4.5801

Abstract

This work aims to understand a dog’s behavior towards environmental stimuli. Different from previous works, we collect multi-modality data including both video and audio data observed from the dog’s egocentric perspective. We propose to model the association between a dog’s reaction and the visual and auditory stimuli perceived by the dog using machine learning, in particular through an extended Convolutional Neural Network (eCNN). The eCNN model takes colored images, Short Time Fourier Transform (STFT) of audio, and motion fields extracted from image sequences as input, and outputs a prediction of the dog’s reaction, classified as Sit, Stand, Walk, or Smell. Our proposed model achieves promising prediction results, with an average accuracy of 79.02% over all four classes. We also evaluate model performance by separately using one of image, audio, and motion information. Our results show that the dog responds strongly to low-frequency sounds and various color differences in its field of view. These research findings provide valuable insights to understanding animal behavior and intelligence as well as insights for building robotic companion dogs.

Full Text