Abstract

Increases in the scale and complexity of behavioral data pose an increasing challenge for data analysis. A common strategy involves replacing entire behaviors with small numbers of handpicked, domain-specific features, but this approach suffers from several crucial limitations. For example, handpicked features may miss important dimensions of variability, and correlations among them complicate statistical testing. Here, by contrast, we apply the variational autoencoder (VAE), an unsupervised learning method, to learn features directly from data and quantify the vocal behavior of two model species: the laboratory mouse and the zebra finch. The VAE converges on a parsimonious representation that outperforms handpicked features on a variety of common analysis tasks, enables the measurement of moment-by-moment vocal variability on the timescale of tens of milliseconds in the zebra finch, provides strong evidence that mouse ultrasonic vocalizations do not cluster as is commonly believed, and captures the similarity of tutor and pupil birdsong with qualitatively higher fidelity than previous approaches. In all, we demonstrate the utility of modern unsupervised learning approaches to the quantification of complex and high-dimensional vocal behavior.

Highlights

  • Quantifying the behavior of organisms is of central importance to a wide range of fields including ethology, linguistics, and neuroscience

  • It discovers features that best capture variability in the data, offering a nonlinear generalization of methods like Principal Components Analysis (PCA) and Independent Components Analysis (ICA) that adapts well to high-­dimensional data like natural images (Dai et al, 2018; Higgins et al, 2017). By applying this technique to collections of single syllables, encoded as time-­frequency spectrograms, we looked for latent spaces underlying vocal repertoires across individuals, strains, and species, asking whether these data-­dependent features might reveal aspects of vocal behavior overlooked by traditional acoustic metrics and provide more principled means for assessing differences among these groups

  • Our contributions are fourfold: first, we show that the variational autoencoder (VAE)’s learned acoustic features outperform common sets of handpicked features in a variety of tasks, including capturing acoustic similarity, representing a well-s­ tudied effect of social context on zebra finch song, and comparing the ultrasonic vocalizations (USVs) of different mouse strains

Read more

Summary

Introduction

Quantifying the behavior of organisms is of central importance to a wide range of fields including ethology, linguistics, and neuroscience. A major goal of these various lines of inquiry has been to develop methods for the quantitative analysis of vocal behavior, and these efforts have resulted in several powerful approaches that enable the automatic or semi-­automatic analysis of vocalizations (Tchernichovski and Mitra, 2004; Coffey et al, 2019; Van Segbroeck et al, 2017; Sainburg et al, 2019; Tchernichovski et al, 2000; Mandelblat-C­ erf and Fee, 2014; Mets and Brainard, 2018; Kollmorgen et al, 2020; Holy and Guo, 2005).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call