DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning.

Jacob M Graving,Liang Li,Blair R Costelloe,Hemal Naik,Daniel Chae,Benjamin Koger,Iain D Couzin

doi:10.7554/elife.47994

Abstract

Quantitative behavioral measurements are important for answering questions across scientific disciplines-from neuroscience to ecology. State-of-the-art deep-learning methods offer major advances in data quality and detail by allowing researchers to automatically estimate locations of an animal's body parts directly from images or videos. However, currently available animal pose estimation methods have limitations in speed and robustness. Here, we introduce a new easy-to-use software toolkit, DeepPoseKit, that addresses these problems using an efficient multi-scale deep-learning model, called Stacked DenseNet, and a fast GPU-based peak-detection algorithm for estimating keypoint locations with subpixel precision. These advances improve processing speed >2x with no loss in accuracy compared to currently available methods. We demonstrate the versatility of our methods with multiple challenging animal pose estimation tasks in laboratory and field settings-including groups of interacting individuals. Our work reduces barriers to using advanced tools for measuring behavior and has broad applicability across the behavioral sciences.

Highlights

Understanding the relationships between individual behavior, brain activity, and collective and social behaviors (Rosenthal et al, 2015; StrandburgPeshkin et al, 2013; Jolles et al, 2017; Klibaite et al, 2017; Klibaite and Shaevitz, 2019) is a central goal of the behavioral sciences—a field that spans disciplines from neuroscience to psychology, ecology, and genetics
Our methods build on the state-of-the-art for individual pose estimation (Newell et al, 2016; Appendix 5), convolutional regression models (Jegou et al, 2017; Appendix 4: ’Encoder-decoder models’), and conventional computer vision algorithms (Guizar-Sicairos et al, 2008) to improve model efficiency and achieve faster, more accurate results on multiple challenging pose estimation tasks
We developed two model implementations—including a new model architecture that we call Stacked DenseNet—and a new method for processing confidence maps called subpixel maxima that provides fast and accurate peak detection for estimating keypoint locations with subpixel precision—even at low spatial resolutions

Summary

Introduction

Understanding the relationships between individual behavior, brain activity (reviewed by Krakauer et al, 2017), and collective and social behaviors (Rosenthal et al, 2015; StrandburgPeshkin et al, 2013; Jolles et al, 2017; Klibaite et al, 2017; Klibaite and Shaevitz, 2019) is a central goal of the behavioral sciences—a field that spans disciplines from neuroscience to psychology, ecology, and genetics. A cornerstone of this interdisciplinary revolution is the use of state-of-the-art computational tools, such as computer vision algorithms, to automatically measure locomotion and body posture (Dell et al, 2014) Such a rich description of animal movement allows for modeling, from first principles, the full behavioral repertoire of animals (Stephens et al, 2011; Berman et al, 2014b; Berman et al, 2016; Wiltschko et al, 2015; Johnson et al, 2016b; Todd et al, 2017; Klibaite et al, 2017; Markowitz et al, 2018; Klibaite and Shaevitz, 2019; Costa et al, 2019).

Methods

Results

Conclusion