Abstract

Facial video-based blood volume pulse (BVP) signal measurement holds great potential for remote health monitoring, while existing methods have issues with convolutional kernel perceptual field constraints. This article proposes an end-to-end multi-level constrained spatiotemporal representation structure for facial video-based BVP signal measurement. First, an intra- and inter-subject feature representation is proposed to strengthen the BVP-related features generation at high, semantic, and shallow levels, respectively. Second, the global-local association is presented to enhance BVP signal period pattern learning, and the global temporal features are introduced into the local spatial convolution of each frame by adaptive kernel weights. Finally, the multi-dimensional fused features are mapped to one-dimensional BVP signals by the task-oriented signal estimator. The experimental results on the publicly available MMSE-HR dataset demonstrate that the proposed structure overperforms state-of-the-art methods (e.g., AutoHR) in BVP signal measurement, with a 20% and 40% reduction in mean absolute error and root mean squared error, respectively. The proposed structure would be a powerful tool for telemedical and non-contact heart health monitoring.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call