Multimodal Sensor Fusion Frameworks With Application to Human Action Recognition

Zeeshan Ahmad

doi:10.32920/25234624

Abstract

<p>Human Action Recognition (HAR) is a progressive research area in the field of computer vision and machine learning. Earlier methods on HAR were based on single sensor modality, either vision-based sensor or wearable inertial sensor. Both of these modalities have some limitations that prevent widespread adoption of HAR; e.g. visual sensors typically require elaborate hardware setup and are limited to a small operating area, where inertial sensors are prone to drift. The solution is to fuse the information from different modalities. In this dissertation, we present novel multimodal sensor fusion frameworks that overcome the limitations of single sensor modality. In these frameworks, we convert all data streams to images through innovative signal to image conversion schemes and feed them to Convolutional Neural Networks (CNN), thus enabling extraction of higher-level features that CNNs are proven to be capable of especially from images. Moreover, we propose fast and robust multilevel fusion schemes by extracting features from multiple layers of the CNNs and employing statistical methods such as Canonical Correlation Analysis and gated fusion, instead of the more popular single stage fusion. We applied these fusion frameworks for HAR using depth and inertial sensors. At the input of each fusion framework, we transform depth and inertial sensor data into images called Sequential Front view Images (SFI) and Signal Images (SI). The SFI and SI images are then fused through our proposed multilevel frameworks for more accurate HAR while maintaining computational speed. We evaluate the proposed frameworks on three public multimodal HAR datasets, namely, UTD Multimodal Human Action Dataset (MHAD), Berkeley MHAD, and UTDMHAD Kinect V2 and achieved accuracies of 99.3%, 99.85% and 99.8% respectively.</p> <p>While the proposed frameworks were developed with HAR as a target application area, they can be applied to other fusion problems as well. We show the generalizability of frameworks by applying them to a different domain, where ECG (1D time series) data is converted to multimodal images and fed through our fusion frameworks for arrythmia classification and stress assessment. Preliminary results in these applications are encouraging, further strengthening the significance of the proposed frameworks.</p>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multimodal Sensor Fusion Frameworks With Application to Human Action Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multimodal Sensor Fusion Frameworks With Application to Human Action Recognition
Zeeshan Ahmad
-
Zeeshan AhmadZeeshan Ahmad
16 Feb 2024
16 Feb 2024

Human Action Recognition Using Deep Multilevel Multimodal (${M}^{2}$ ) Fusion of Depth and Inertial Sensors
Zeeshan Ahmad ... Naimul Khan
IEEE Sensors Journal | VOL. 20
Zeeshan Ahmad, et. al.Zeeshan Ahmad ... Naimul Khan
01 Feb 2020
IEEE Sensors Journal | VOL. 20

CNN-Based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors
Zeeshan Ahmad ... Naimul Khan
IEEE Sensors Journal | VOL. 21
Zeeshan Ahmad, et. al.Zeeshan Ahmad ... Naimul Khan
06 Oct 2020
IEEE Sensors Journal | VOL. 21

Multi-modal human action recognition using deep neural networks fusing image and inertial sensor data
Inhwan Hwang ... Songhwai Oh
-
Inhwan Hwang, et. al.Inhwan Hwang ... Songhwai Oh
01 Nov 2017
01 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Sensor Fusion Frameworks With Application to Human Action Recognition

Abstract

Talk to us

Similar Papers