Abstract

Unconstrained hand detection in still images plays an important role in many hand-related vision problems, for example, hand tracking, gesture analysis, human action recognition and human-machine interaction, and sign language recognition. Although hand detection has been extensively studied for decades, it is still a challenging task with many problems to be tackled. The contributing factors for this complexity include heavy occlusion, low resolution, varying illumination conditions, different hand gestures, and the complex interactions between hands and objects or other hands. In this paper, we propose a multiscale deep learning model for unconstrained hand detection in still images. Deep learning models, and deep convolutional neural networks (CNNs) in particular, have achieved state-of-the-art performances in many vision benchmarks. Developed from the region-based CNN (R-CNN) model, we propose a hand detection scheme based on candidate regions generated by a generic region proposal algorithm, followed by multiscale information fusion from the popular VGG16 model. Two benchmark datasets were applied to validate the proposed method, namely, the Oxford Hand Detection Dataset and the VIVA Hand Detection Challenge. We achieved state-of-the-art results on the Oxford Hand Detection Dataset and had satisfactory performance in the VIVA Hand Detection Challenge.

Highlights

  • Robust hand detection in unconstrained environments is one of the most important yet challenging problems in computer vision

  • Our improvements to the convolutional neural networks (CNNs) architecture are not constrained by the type of models, our design is based upon the VGG16 model [35], a widely applied deep CNN model

  • This paper presented a Multiscale Fast region-based CNN (R-CNN) approach to accurately detect human hands in unconstrained images

Read more

Summary

Introduction

Robust hand detection in unconstrained environments is one of the most important yet challenging problems in computer vision. A part-based model, that is, Deformable Part Model (DPM) proposed by Felzenszwalb et al [7], had been in the lead in object detection before 2014 This method specially applied HOG features of images, with latent parts of objects forming a deformable graphical model of objects, and achieved promising results. An appropriately designed CNN model can learn multiple stages of invariant features of an image and a CNN based object detection is generally an end-to-end system that is jointly optimized for both feature representation and classification. Rather than designing complex structures, as in [19], to fit the scale variations of objects, we propose a multiscale detection system for hand objects by exploring the scale rich representations provided by a single CNN. (1) To achieve multiscale representation of hand objects, we propose a strategy to integrate the features from multiple layers of a CNN model.

Related Works
Our Methods
Experiments
Methods
The recall of different threshold on the VIVA L2 dataset
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.