Abstract

Human keypoint detection is not applicable in low-light and nighttime conditions. In this work, we innovatively use infrared images for multi-person keypoint detection, which makes some computer vision tasks, such as action recognition and behavior analysis, applicable in complex illumination environments. By fully considering the physical characteristics of infrared imaging, we design a top-down solution that first uses a single-stage target detection network, YOLO, to predict the bounding box of the human body, then feed the detected human body into a following human keypoint detection network. We chose SimpleBaseline, well-known in human keypoint detection using visible images, as the base network. Since the infrared image is blur imaging and low resolution, we use targeted feature fusion, channel attention, and spatial attention to capture the feature of the infrared image. In addition, we use depth-separable convolution to reduce the number of parameters in the network. In the literature, there is no benchmark infrared image dataset for multi-person keypoint detection. We construct an infrared image dataset containing 1500 annotated images carefully selected from several public infrared pedestrian datasets. Compared with the SimpleBaseline, extensive experimental results show that our method achieves nearly the same performance on the visible COCO dataset, but has about 8% higher AP on the self-built infrared dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call