Design of hand detection based on attention and feature enhancement pyramids

Jiao Li,Yang Qiao,Zhongyu Li,Sijie Ran,Haodong Sun,Xuecheng Sun

doi:10.1117/1.jei.31.3.033005

Abstract

Hand detection plays an important role in human–computer interaction. Because of the convenient and natural advantages of hands, hand detection is increasingly used in virtual reality, remote control, and other fields. However, since the complex background and the diversity of hand postures, the YOLOv4 algorithm for hand detection suffers from low accuracy and robustness. Therefore, A YOLOv4-HAND network, improved from YOLOv4, is proposed to solve the problem. We first use the dilation convolution to build the feature enhancement pyramid that enables the network to expand semantic information. Second, for better detection of different hand scales, we design a multiscale attention module to capture the correlation of channel information within different scales. Third, we design a head that incorporates a spatial attention module to compensate for the network’s lack of spatial contextual location information correlation. Finally, we use soft nonmaximum suppression to reduce the impact of occlusion. The results show that the YOLOv4-HAND detection network can achieve 83.22% and 93.95% mAP on the publicly available datasets Oxford hand and Egohands datasets. Compared with the most recent method, the YOLOv4-HAND network effectively improves the accuracy of hand detection for practical applications.

Full Text