Heart rate (HR) and respiratory rate (RR) are two critical physiological parameters that can be estimated from video recordings. However, the accuracy of remote estimation of HR and RR is affected by fluctuations in ambient illumination. To address this adverse effect, we propose a fore-background spatiotemporal (FBST) method for estimating HR and RR from videos captured by consumer-grade cameras. Initially, we identify the foreground regions of interest (ROIs) on the face and chest, as well as the background ROIs in non-body areas of the videos. Subsequently, we construct the foreground and background spatiotemporal maps based on the dichromatic reflectance model. We then introduce a lightweight network equipped with adaptive spatiotemporal layers to process the spatiotemporal maps and automatically generate a feature map of the non-illumination perturbation pulses. This feature map serves as input to a ResNet-18 network to estimate the physiological rhythm. Finally, we extract pulse signals and estimate HR and RR concurrently. Experiments conducted on three public and one private dataset demonstrate the superiority of the proposed FBST method in terms of accuracy and computational efficiency. These findings provide novel insights into non-intrusive human physiological measurements using common devices.
Read full abstract