With the increasing awareness of health, using wearable sensors to monitor individual activities and accurately estimate energy expenditure has become a current research focus. However, existing research encounters challenges including low estimation accuracy, a deficiency of frequency domain features, and difficulty in integrating time domain and frequency domain features. To address these issues, we propose an innovative framework called the Dual-Stream Fusion Network (DSFN). This framework combines the Time Domain Encoding (TDE) module, the Frequency Domain Hierarchical-Split Encoding (FDHSE) module, and a Two-Stage Feature Fusion (TSF) module. Specifically, the temporal stream of the framework employs the TDE module to capture deep temporal features that reflect the complex dynamic variations in time-series data. The frequency domain stream introduces the FDHSE module, which extracts frequency domain features using a multi-level, multi-scale approach, ensuring a comprehensive and diverse representation of frequency information. Through this dual-stream architecture, our model effectively learns both time and frequency domain features, addressing the limitations of frequency domain features observed in prior studies. Additionally, we propose the TSF module to fully integrate time and frequency domain features, effectively overcoming the challenge of fusing these two types of features. We conducted experiments on two public datasets, namely the GOTOV dataset (elderly people) and the JSI dataset (young people). Experimental results demonstrate that our method achieves excellent performance across different age groups. Compared to the baseline models, the proposed DSFN significantly improves the accuracy of human energy expenditure estimation.
Read full abstract