Abstract

Recent research has shown that a person's heart rate (HR) can be estimated using video data through remote photoplethysmography (rPPG). However, this approach is faced with various challenges, including the inability to prepare training data that encompasses all realistic conditions, the impact of complex inference models on reasoning speed, and the lack of interpretability that hinders medical and healthcare applications. To tackle these issues, a lightweight and interpretable convolutional neural network is proposed for real-time HR monitoring using a low-cost video camera under realistic conditions. The Mediapipe framework is leveraged to construct a facial detection and tracking pipeline that is robust to head movements and illumination changes. Empirical mode decomposition (EMD) is then combined with a channel-wise attention-based convolutional neural network (CNN) for HR inference. Additionally, a temporal long-term peak merge method is proposed as a post-processing step to further enhance the accuracy of the neural network inference. The results of linear regression and Bland-Altman analysis demonstrate consistency between the estimated HR values and the ground truth. Moreover, experimental outcomes show no significant difference in the inference times of the proposed method running with or without a GPU, with a reasoning speed on mobile CPU remaining within 100 ms, ensuring real-time HR monitoring. Furthermore, this study provides pioneering empirical evidence to open the black box of neural networks in HR monitoring using rPPG signals.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call