The rPPG technology is a facial video-based non-contact physiological parameter detection technology, which has been widely used in the measurement of heart rate, respiration, blood oxygen and other physiological parameters. Most of the current research hotspots in this field are based on deep learning methods, which typically necessitate a significant amount of data samples for training the neural network to obtain accurate measurement results. However, the high cost required to label the data samples limits the diffusion and application of the technique. To address the above problems, we propose a new self-supervised aimed at acquiring the ability to estimate rPPG signals from facial videos, eliminating the need for labeled data. The framework expands the unlabeled video samples into multiple positive/negative samples and uses contrast learning to obtain an rPPG signal estimation network, which outputs the corresponding rPPG signals of face videos. To promote the convergence of network training, a new frequency loss function is designed. This function can effectively shorten the distance between the frequencies of similar sample signals and push the distance between the frequencies of different sample signals far away, so as to enhance the frequency consistency between sample signals and make the model easier to learn the differences between different sample pairs. Our method is evaluated on four standard rPPG datasets, the experimental results show that the accuracy is close to the current best supervised method, and is superior to the previous self-supervised method without using any labeled data.
Read full abstract