Abstract

Neural architecture search has proven to be highly effective in the design of efficient convolutional neural networks that are better suited for mobile deployment than hand-designed networks. Hypothesizing that neural architecture search holds great potential for human pose estimation, we explore the application of neuroevolution, a form of neural architecture search inspired by biological evolution, in the design of 2D human pose networks for the first time. Additionally, we propose a new weight transfer scheme that enables us to accelerate neuroevolution in a flexible manner. Our method produces network designs that are more efficient and more accurate than state-of-the-art hand-designed networks. In fact, the generated networks process images at higher resolutions using less computation than previous hand-designed networks at lower resolutions, allowing us to push the boundaries of 2D human pose estimation. Our base network designed via neuroevolution, which we refer to as EvoPose2D-S, achieves comparable accuracy to SimpleBaseline while being 50% faster and 12.7x smaller in terms of file size. Our largest network, EvoPose2D-L, achieves new state-of-the-art accuracy on the Microsoft COCO Keypoints benchmark, is 4.3x smaller than its nearest competitor, and has similar inference speed. The code is publicly available at https://github.com/wmcnally/evopose2d.

Highlights

  • Two-dimensional human pose estimation is a visual recognition task dealing with the autonomous localization of anatomical human joints, or more broadly, “keypoints,” in RGB images and video [1]–[5]

  • We explore the application of neuroevolution [39], a realization of neural architecture search (NAS) inspired by evolution in nature, to 2D human pose estimation for the first time

  • While we focus on the application of 2D human pose estimation, we note that our neuroevolution approach is generally applicable to all types of deep networks

Read more

Summary

Introduction

Two-dimensional human pose estimation is a visual recognition task dealing with the autonomous localization of anatomical human joints, or more broadly, “keypoints,” in RGB images and video [1]–[5]. It is widely considered a fundamental problem in computer vision due to its many downstream applications, including action recognition [6]– [11] and human tracking [12]–[14]. This paper focuses on the latter stage of this commonly used top-down pipeline, but we emphasize that our method is applicable to the design of bottom-up human pose estimation networks [22], [25] as well

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call