Abstract

The research of multi-person pose estimation has been largely improved recently. However, multi-person pose estimation in complex environments is still challenging. For example, the following two situations cannot be handled well by existing pose estimation methods: first, there are pedestrians that are not upright or even inverted in the image, and pedestrians of different scales appear in the same image. To solve these problems, the Progressive rotation correction module (PRCM) and Scale-invariance module (SIM) based on multi-scale feature fusion are proposed. First of all, the PRCM was proposed to address the situation where pedestrians appear rotated or even inverted in the image. This module is divided into three stages, with the aim of gradually correcting the inverted human to an upright one. Besides, SIM is designed to handle multi-scale problems. In this module, dilated convolutions with different receptive field are used to extract multi-scale information. Then, the extracted multi-scale features (different semantic information in different feature maps) will be fused to solve the multi-scale problem. The experimental results show that our algorithm can reach an AP value of 72.0% when tested on the COCO2017 dataset. Demonstrates that the proposed method is superior to state-of-the-art methods.

Highlights

  • Human pose estimation has always been a challenging research area in computer vision

  • Inverted and rotated images are added to the network for training to enhance the ability of Progressive rotation correction module (PRCM) to adapt to complex scenes

  • In this paper, we have proposed a novel multi-person pose estimation under complex environment based on progressive rotation correction and multi-scale feature fusion

Read more

Summary

INTRODUCTION

Human pose estimation has always been a challenging research area in computer vision. Newell’s Stacked Hourglass Networks can learn the local features of key points through a multi-scale receptive field mechanism [5] This Hourglass module is designed to capture the local information contained in images at different scales, while the final pose estimation requires a consistent understanding of the whole body. They designed the Trident Networks that use different dilated convolutions to adapt to objects of different scales Inspired by these previous studies, for multi-scale problems in images, we can expand the receptive field method in large-scale images to obtain key points information, and reduce the receptive field appropriately on small scales. The corrected human image is sent to the Scale-invariant module (SIM) for further feature extraction In this module, multi-scale information is learned through convolution kernels of different receptive fields, and multi-scale information is fused to deal with multi-scale problems. The symbol ⊗ represents a convolution operation, and ⊕ represents a fusion operation

JOINT TRAINING OF TWO MODULES
EXPERIMENT ANALYSIS
EXPERIMENTAL ENVIRONMENT AND EVALUATION METRIC
3) EVALUATION METRIC
COMPARISON WITH STATE-OF-THE-ART METHODS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.