Abstract

BackgroundThe use of remote photoplethysmography (rPPG) to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years. Existing methods are primarily based on a singlescale region of interest (ROI). However, some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space. Also, existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information, which is crucial in pulse extraction problems, resulting in insufficient spatiotemporal feature modelling. MethodsHere, we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution (SSTC) and dimension separable attention (DSAT). First, to solve the problem of a single-scale ROI, we constructed a multi-scale feature space for initial signal separation. Second, SSTC and DSAT were designed for efficient spatiotemporal correlation modeling, which increased the information interaction between the long-span time and space dimensions; this placed more emphasis on temporal features. ResultsThe signal-to-noise ratio (SNR) of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset, outperforming state-of-the-art algorithms. ConclusionsThe results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals. The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call