The extraction of 3D human pose and body shape details from a single monocular image is a significant challenge in computer vision. Traditional methods use RGB images, but these are constrained by varying lighting and occlusions. However, cutting-edge developments in imaging technologies have introduced new techniques such as single-pixel imaging (SPI) that can surmount these hurdles. In the near-infrared spectrum, SPI demonstrates impressive capabilities in capturing a 3D human pose. This wavelength can penetrate clothing and is less influenced by lighting variations than visible light, thus providing a reliable means to accurately capture body shape and pose data, even in difficult settings. In this work, we explore the use of an SPI camera operating in the NIR with time-of-flight (TOF) at bands 850-1550nm as a solution to detect humans in nighttime environments. The proposed system uses the vision transformers (ViT) model to detect and extract the characteristic features of humans for integration over a 3D body model SMPL-X through 3D body shape regression using deep learning. To evaluate the efficacy of NIR-SPI 3D image reconstruction, we constructed a laboratory scenario that simulates nighttime conditions, enabling us to test the feasibility of employing NIR-SPI as a vision sensor in outdoor environments. By assessing the results obtained from this setup, we aim to demonstrate the potential of NIR-SPI as an effective tool to detect humans in nighttime scenarios and capture their accurate 3D body pose and shape.
Read full abstract