Remote photoplethysmography (rPPG) based on the computer vision technology is widely used to calculate the heart rate (HR) from facial videos. Existing rPPG techniques have been subject to some limitations [e.g., highly redundant spatial information, head movement noise, and region of interest (ROI) selection]. To address these limitations, this article introduces an effective spatial-temporal attention network. A temporal fusion module is first proposed to fully exploit the time-domain information, aiming to reduce the redundant video information and strengthen the association relationships of long-range videos. Furthermore, a spatial attention mechanism is designed in the backbone net to precisely target the skin ROIs. Finally, to assist the network in learning the weights between channels, we project the RGB images using the plane orthogonal to skin (POS) algorithm and add motion representation to complement physiological signals’ extraction. The extensive experiments on the public PURE, MMSE-HR, and UBFC-rPPG datasets demonstrate that our model achieves competitive results compared with other methods.
Read full abstract