The pulse condition, also called mai xiang, a significant diagnostic indicator in traditional Chinese medicine (TCM), has increased demand for intelligent diagnosis in recent years. However, the existing intelligent pulse condition diagnosis is mostly sensor-based or contact-dependent, resulting in numerous inconveniences, especially in mobile healthcare. Recently, remote photoplethysmography(rPPG) with the information captured from facial videos has been widely used in non-contact physiological measurements, such as blood pressure and heart rate monitoring. Inspired by these studies, a novel end-to-end rPPG-based method, named PulseNet, is proposed for non-contact pulse condition diagnosis in this paper. In PulseNet, the transformer is employed to extract spatio-temporal features from facial videos, multi-scale fusion is adopted to enhance the feature representation progressively, and multi-task learning is proposed for final pulse condition diagnosis. Besides, we propose the spatio-temporal difference attention (STDA) block in the transformer to aggregate spatio-temporal difference clues of the local region to provide fine-grained spatio-temporal context. We conduct extensive experiments with the proposed method. The results show that PulseNet outperforms the state-of-the-art rPPG-based methods. To the best of our knowledge, our method is the first attempt to detect pulse conditions with the information captured from facial videos, making it a promising non-contact diagnosis approach for mobile healthcare in TCM.
Read full abstract