With the proliferation of intelligent sensors integrated into mobile devices, fine-grained human activity recognition (HAR) based on lightweight sensors has emerged as a useful tool for personalized applications. Although shallow and deep learning algorithms have been proposed for HAR problems in the past decades, these methods have limited capability to exploit semantic features from multiple sensor types. To address this limitation, we propose a novel HAR framework, DiamondNet, which can create heterogeneous multisensor modalities, denoise, extract, and fuse features from a fresh perspective. In DiamondNet, we leverage multiple 1-D convolutional denoising autoencoders (1-D-CDAEs) to extract robust encoder features. We further introduce an attention-based graph convolutional network to construct new heterogeneous multisensor modalities, which adaptively exploit the potential relationship between different sensors. Moreover, the proposed attentive fusion subnet, which jointly employs a global-attention mechanism and shallow features, effectively calibrates different-level features of multiple sensor modalities. This approach amplifies informative features and provides a comprehensive and robust perception for HAR. The efficacy of the DiamondNet framework is validated on three public datasets. The experimental results demonstrate that our proposed DiamondNet outperforms other state-of-the-art baselines, achieving remarkable and consistent accuracy improvements. Overall, our work introduces a new perspective on HAR, leveraging the power of multiple sensor modalities and attention mechanisms to significantly improve the performance.
Read full abstract