DoubleHigherNet: Coarse-to-Fine Precise Heatmap Bottom- Up Dynamic Pose Computer Intelligent Estimation

Yiheng Peng,Zhichun Jiang

doi:10.1088/1742-6596/2033/1/012068

Yiheng Peng, Zhichun Jiang

Open Access

PDF Available

https://doi.org/10.1088/1742-6596/2033/1/012068

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Accurate keypoint positioning is necessary for bottom-up multi-person pose estimation methods to handle scale variation and crowdedness. In this paper, we present DoubleHigherNet: a novel network learning scale-aware and precise heatmap representation for bottom-up process using double high-resolution feature pyramids and coarse-to-fine training. The two feature pyramids in DoubleHigherNet consists of 1/4 resolution feature and higher-resolution (1/2) maps generated by attention fusion blocks and transposed convolutions. Benefited by the training strategy, muti-resoltion and coarse-fine heatmap aggregation, the proposed approach is able to predict keypoints more accurately so as to perform better on difficult crowded scenes. DoubleHigherNet-w32 achieves competitive result on CrowdPose-test, surpassing all the top-down methods and bottom-up SOTA HigherHRNet-w32 (which possesses similar number of params with DoubleHigherNet-w32).

Highlights

We propose a DoubleHigherNet with two cascaded feature pyramids
We evaluate the impact of our proposed attention fusion block, coarse-to-fine learning and coarse-fine heatmap aggregation
In this paper, we present DoubleHigherNet: a novel network designed for bottom-up muti-person pose estimation

Summary

Introduction

A top-down method first employs a human detector such as Mask-Rcnn (He et al.2017) to obtain the bounding-box of each person instance in the image. The bottom-up process first determines the identity-free joints position of all people in the input image by predicting the heatmaps of different body parts, and groups them into instances of different people. This strategy effectively improves the speed of bottom-up methods and their ability to realize real-time pose estimation. Independent of human detector, bottom-up methods perform better on crowd-pose, a benchmark with various dense and difficult scenes. As there is a conflict between estimating small persons and large persons, the second strategy, feature pyramid, is introduced by HigherHRNet [3] to balance the performance on persons of different scales

Methods

Results

Conclusion