Aligning physiological parameter labels with large-scale photoplethysmographic (PPG) data for deep learning is challenging and resource-intensive. While self-supervised representation learning (SSRL) can handle limited annotated data, the challenge lies in learning robust shared representations from vast unlabeled data and integrating various contextual cues to learn distinctive representations. To alleviate these challenges, a generative SSRL framework TS2TC is proposed to collaboratively utilize the temporal, spectrogram, and temporal-spectrogram mixed domains to explore and incorporate the unique features of PPG for universal and non-invasive physiological parameter estimation. Initially, a pretext task named Cross-Temporal Fusion Generative Anchor (CTFGA) is designed, modeling temporal dependencies and reconstructing independent segments at a coarse level to provide robust global feature extraction and local semantic contextual representation. The framework also includes sub-signals from PPG with diverse frequency scales and order derivatives reflecting hemodynamics to facilitate learning shared representations at varying semantic levels. Secondly, an advanced cognitive-inspired dual-process transfer (DPT) strategy is formulated, consisting of prior-dependent autonomous processes and posterior observation reasoning processes, to leverage the independent and integrated advantages of shared and specific representations. Furthermore, TS2TC introduces a novel bilinear temporal-spectrogram fusion method in the mixed domain, aligning latent representations from different domains, and establishing fine-grained contextual interactions at the feature level across multiple sources of information. Extensive experiments on physiological parameter estimation tasks showed that the joint performance of CTFGA and DPT outperforms standard generative learning significantly. TS2TC achieved an average 2.49% improvement in RMSE over the current state-of-the-art estimation methods with only 10% training data.
Read full abstract