Remote physiology, which involves monitoring vital signs without the need for physical contact, has great potential for various applications. Current remote physiology methods rely only on a single camera or radio frequency (RF) sensor to capture the microscopic signatures from vital movements. However, our study shows that fusing deep RGB and RF features from both sensor streams can further improve performance. Because these multimodal features are defined in distinct dimensions and have varying contextual importance, the main challenge in the fusion process lies in the effective alignment of them and adaptive integration of features under dynamic scenarios. To address this challenge, we propose a novel vital sensing model, named Fusion-Vital, that combines the RGB and RF modalities through the new introduction of pairwise input formats and transformer-based fusion strategies. We also perform comprehensive experiments based on a newly collected and released remote vital dataset comprising synchronized video-RF sensors, showing the superiority of the fusion approach over the previous single-sensor baselines in various aspects.
Read full abstract