Abstract

Accurate hand joints detection from images is a fundamental topic that is essential for many applications in computer vision and human–computer interaction. This paper presents a two-stage network for hand joints detection from a single unmarked image by using serial-parallel multi-scale feature fusion. In stage I, the hand regions are located by an encoder-decoder network, and the features of each detected hand region are extracted by a shallow spatial hand features representation module. The extracted hand features are then fed into stage II, which consists of serially connected feature extraction modules with similar structures, called “multi-scale feature fusion” (MSFF). An MSFF contains parallel multi-scale feature extraction branches, which generate initial hand joint heatmaps. The initial heatmaps are then mutually reinforced by the anatomic relationship between hand joints. The hand joint detection accuracy shows that the proposed network overperforms the state-of-the-art methods on current datasets, 1) RHD, 2) HS, 3) MPII & NZSL, 4) DCD8-6000, with the PCK@0.2 of 0.94, 0.92, 0.84, 0.97. Meanwhile, one hand in the image takes between 24 and 37 ms to process, which is adequate for supporting many real-time applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.