Abstract
Mice use ultrasonic vocalizations (USVs) to convey a variety of socially relevant information. These vocalizations are affected by the sex, age, strain, and emotional state of the emitter and can thus be used to characterize it. Current tools used to detect and analyze murine USVs rely on user input and image processing algorithms to identify USVs, therefore requiring ideal recording environments. More recent tools which utilize convolutional neural networks models to identify vocalization segments perform well above the latter but do not exploit the sequential structure of audio vocalizations. On the other hand, human voice recognition models were made explicitly for audio processing; they incorporate the advantages of CNN models in recurrent models that allow them to capture the sequential nature of the audio. Here we describe the HybridMouse software: an audio analysis tool that combines convolutional (CNN) and recurrent (RNN) neural networks for automatically identifying, labeling, and extracting recorded USVs. Following training on manually labeled audio files recorded in various experimental conditions, HybridMouse outperformed the most commonly used benchmark model utilizing deep-learning tools in accuracy and precision. Moreover, it does not require user input and produces reliable detection and analysis of USVs recorded under harsh experimental conditions. We suggest that HybrideMouse will enhance the analysis of murine USVs and facilitate their use in scientific research.
Highlights
Many vertebrates use Species-specific vocal communications for social interactions (Todt and Naguib, 2000; Wilkins et al, 2013; Chen and Wiens, 2020)
Current tools used to detect and analyze murine ultrasonic vocalizations (USVs) rely on user input and classical image processing algorithms to identify and clean USVs, requiring manual adjustments and ideal recording environments
HybridMouse model extracts the timestamps of each USV, its frequency, and a clean denoised representation
Summary
Many vertebrates use Species-specific vocal communications for social interactions (Todt and Naguib, 2000; Wilkins et al, 2013; Chen and Wiens, 2020). More recent tools utilize machine learning and convolutional neural networks (CNN) models (Van Segbroeck et al, 2017; Fonseca et al, 2021) to identify vocalization segments. DeepSqueak (DS), a recent benchmark tool for detecting USVs, relies on neural networks to detect USVs (Coffey et al, 2019). It implements an object detection architecture, namely: regional convolutional neural networks (Faster-RCNN) (Ren et al, 2017). CNN models do not exploit the temporal correlations of audio signals and underperform under noisy recording conditions. Recurrent neural networks (RNNs) can compensate for these weaknesses by capturing long contextual dependencies, using prior knowledge, offering better performance
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have