Abstract

In this paper, we propose a cluster-wise feature aggregation network that exploits multi-level contextual association for multi-person pose estimation. The recent popular approach for pose estimation is extracting the local maximum response from each detection heatmap that trained for a specific keypoint type. To exploit more contextual information, our network simultaneously learns complementary semantic information to encourage the detected keypoints subject to a certain contextual constraint. Specifically, our network uses dense and sparse branches to generate paired multi-peak detection heatmaps for clusters of keypoints. To enhance the feature passing through the network, we aggregate information from different branches. The in-branch aggregation enriches the detection features in each branch by absorbing the holistic human region attention. The cross-branch aggregation further strengthens the detection features by fusing global and local context information between dense and sparse branches. We demonstrate competitive performance of our network on the benchmark dataset for multi-person pose estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call