Abstract

Pedestrian detection and human pose estimation are instructive for reconstructing a three-dimensional scenario and for robot navigation, particularly when large amounts of vision data are captured using various data-recording techniques. Using an unrestricted capture scheme, which produces occlusions or breezing, the information describing each part of a human body and the relationship between each part or even different pedestrians must be present in a still image. Using this framework, a multi-layered, spatial, virtual, human pose reconstruction framework is presented in this study to recover any deficient information in planar images. In this framework, a hierarchical parts-based deep model is used to detect body parts by using the available restricted information in a still image and is then combined with spatial Markov random fields to re-estimate the accurate joint positions in the deep network. Then, the planar estimation results are mapped onto a virtual three-dimensional space using multiple constraints to recover any deficient spatial information. The proposed approach can be viewed as a general pre-processing method to guide the generation of continuous, three-dimensional motion data. The experiment results of this study are used to describe the effectiveness and usability of the proposed approach.

Highlights

  • In recent years, because powerful resources have spread through the Internet with different dimensions, how to capture key information from a large amount of data is of interest to researchers.Vision data, which are typically still images or video clips, are a primary form of data that are used to record scene or human activity information

  • Because the scale and input size of two datasets are different, the PARSE and FLIC database have been manually preprocessed to ensure the sizes of the image and human body part are at the same scale and that the label pattern of the these datasets is unified in the training process

  • In this sub-section, we evaluate the proposed body part detection module using three major datasets: the Caltech dataset, the PARSE dataset and the FLIC dataset

Read more

Summary

Introduction

Because powerful resources have spread through the Internet with different dimensions, how to capture key information from a large amount of data is of interest to researchers.Vision data, which are typically still images or video clips, are a primary form of data that are used to record scene or human activity information. Pedestrian detection and pose reconstruction are typically used to capture the key information concerning pedestrians or sportsmen due to their practical applications in scene surveillance, motion animation reconstruction and intelligent robot simulation or navigation. The problem with this process can always be presented as follows: with an unrestricted vision input, which is typically sparse, deficient and multi-scale, the three-dimensional joint information of a real human is difficult to recover precisely using finite resources. Common features, such as a contour or edge histogram descriptor in [1,2], a local intensity feature descriptor, such as SIFT (scale-invariant feature transform) in [3], HOG (histogram of oriented gradients) in [4,5,6,7] and other feature descriptors, such as a color- or texture-based descriptor, can be used to distinguish the difference between each body part and the background in an image while simultaneously maintaining a tolerably-internal variant of each component

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.