Abstract
Sensor-based human activity recognition aims at detecting various physical activities performed by people with ubiquitous sensors. Different from existing deep learning-based method which mainly extracting black-box features from the raw sensor data, we propose a hierarchical multi-view aggregation network based on multi-view feature spaces. Specifically, we first construct various views of feature spaces for each individual sensor in terms of white-box features and black-box features. Then our model learns a unified representation for multi-view features by aggregating views in a hierarchical context from the aspect of feature level, position level and modality level. We design three aggregation modules corresponding to each level aggregation respectively. Based on the idea of non-local operation and attention, our fusion method is able to capture the correlation between features and leverage the relationship across different sensor position and modality. We comprehensively evaluate our method on 12 human activity benchmark datasets and the resulting accuracy outperforms the state-of-the-art approaches.
Highlights
Human Activity Recognition (HAR) refers to the automatic detection of various physical activities performed by people in their daily lives [1]
We apply a non-local operation augmented with the L2-norm to explores the correlation between different features and fuse them
In the position level aggregation, we take the correlation of different sensor positions into consideration by introducing the correlation feature, which can enhance the representation of each view and effectively improves the resulting accuracy
Summary
Human Activity Recognition (HAR) refers to the automatic detection of various physical activities performed by people in their daily lives [1]. In the sensor-based HAR task, the raw data from various modalities is collected and utilized to infer useful contextual information for classifying activities. The first one is how to construct discriminative feature spaces from the heterogeneous sensor data. Focusing on this challenge, early methods leveraged human domain knowledge to feature engineering for HAR, and these properly designed white-box features are extracted based on different types of methods [5,6,7,8,9]. Deep learning models have been brought significant impacts to HAR.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have