Abstract

Recent advances in sensor based human activity recognition (HAR) have exploited deep hybrid networks to improve the performance. These hybrid models combine Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to leverage their complementary advantages, and achieve impressive results. However, the roles and associations of different sensors in HAR are not fully considered by these models, leading to insufficient multi-modal fusion. Besides, the commonly used RNNs in HAR suffer from the 'forgetting' defect, which raises difficulties in capturing long-term information. To tackle these problems, an HAR framework composed of an Inertial Measurement Unit (IMU) fusion block and an applied ConvTransformer subnet is proposed in this paper. Inspired by the complementary filter, our IMU fusion block performs multi-modal fusion of commonly used sensors according to their physical relationships. Consequently, the features of different modalities can be aggregated more effectively. Then, the extracted features are fed into the applied ConvTransformer subnet for classification. Thanks to its convolutional subnet and self-attention layers, ConvTransformer can better capture local features and construct long-term dependencies. Extensive experiments on eight benchmark datasets demonstrate the superior performance of our framework. The source code will be published soon.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call