Abstract

Fast hand detection plays an important role in Intelligent Homecare systems due to its close association with many human-related tasks, such as human activity and behavior recognition. In the last decade, although hand detection has been studied widely, fast hand detection has remained a challenge. In this paper, we employ a single shot multibox detector (SSD) as the base architecture and propose a novel cross-resolution feature fusion (CFF) approach to add contextual information and semantic information to shallower layers for fast hand detection. Our approach helps improve performance significantly, especially in small instances, owing to the inclusion of two important modules: a narrow atrous spatial pyramid pooling (N-ASPP) module and a richer semantic information generation (RSIG) module. The proposed N-ASPP module employs atrous convolution to capture multiscale context information by adopting different atrous rates. The proposed RSIG module uses a resolution-matching submodule to enlarge an input feature map and a ResNeXt block to exploit richer semantic information. In verification experiments, the proposed 2CFF-SSD model achieved 66.41% average precision (AP) in the Oxford hand test set conducted at 55 frames per second on a GTX 1080 Ti graphics processing unit, which is superior to the original SSD method and many other state-of-the-art methods in terms of accuracy, while maintaining the high speed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.