Abstract

Remote heart rate estimation aims to predict cardiac activity signals from facial videos without any physical contact, which has been showing promising results recently. However, existing estimation methods based on deep convolutional networks only focus on the rigid receptive field, while ignoring potential spatial correlations of different facial regions, which obviously cannot reduce the overfitting caused by various noise and motion interference unrelated to cardiac activity. To address these issues, this paper proposes PhysGCN, an end-to-end spatiotemporal graph convolutional network with the hyperbolic embedding, to coordinate the contributions of intra- and inter-frame features of facial videos for long-term heart rate estimation. Specifically, firstly, we convert the facial video captured by the vision system into a graph-structure spatiotemporal map, and use the link set of the graph to determine and lock the spatial relative positions of multiple skin sub-regions formed by intra-frame face segmentation and projection. Secondly, to purify the signal and prevent the interference from heart rate irrelevant features, we integrate and measure the similarity between sub-regions within the graph in a non-Euclidean space by a hyperbolic embedding module, which can characterize the correlation more distinctly compared to the plane space. Finally, we dynamically and elaborately orchestrate the inherent temporal and learned spatial features in a graph convolutional module to obtain reliable heart rate waveforms. We conduct extensive comparative experiments and ablation studies on multiple public datasets to verify the superiority and robustness of our method. Experiments show that our method can effectively estimate heart rate from facial videos, and its performance surpasses or matches the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call