DSFNet: A Distributed Sensors Fusion Network for Action Recognition

Haiyong Shi,Zhenjie Hou,En Lin,Zhuokun Zhong,Jiuzhen Liang

doi:10.1109/jsen.2022.3225031

Abstract

Human action recognition (HAR) has become a hot topic in the field of computer vision and pattern recognition due to its wide range of application prospects. In most deep learning and multisensor data-based action recognition works, sequences from sensors are first stacked into 2-D images and then fed into convolutional neural network (CNN) models to extract features. This approach based on data-level fusion learns different sensor data features but loses the uniqueness of different sensor data. In this article, we propose a new deep learning model called distributed sensors fusion network (DSFNet). For the property that acceleration sequences can reflect motion trends, we first learn the features of different sensors and then use the Transformer encoder module to model the dependencies of multisensor actions and extract features. For the property that angular velocity can reflect the direction and velocity of the local pose, we first learn the features of a single sensor in different motion directions in time sequence and then learn the output feature maps of multisensor features in time sequence and extract the features. Finally, the outputs of different data modalities are used for decision-level fusion, which significantly improves the performance of the model. We evaluate the performance of the proposed model on the self-built dataset Changzhou University: a comprehensive multi-modal human action dataset (CZU-MHAD), and the experimental results show that the DSFNet model outperforms the existing methods.

Full Text