Abstract

<p>Human Action Recognition (HAR) is a progressive research area in the field of computer vision and machine learning. Earlier methods on HAR were based on single sensor modality, either vision-based sensor or wearable inertial sensor. Both of these modalities have some limitations that prevent widespread adoption of HAR; e.g. visual sensors typically require elaborate hardware setup and are limited to a small operating area, where inertial sensors are prone to drift. The solution is to fuse the information from different modalities. In this dissertation, we present novel multimodal sensor fusion frameworks that overcome the limitations of single sensor modality. In these frameworks, we convert all data streams to images through innovative signal to image conversion schemes and feed them to Convolutional Neural Networks (CNN), thus enabling extraction of higher-level features that CNNs are proven to be capable of especially from images. Moreover, we propose fast and robust multilevel fusion schemes by extracting features from multiple layers of the CNNs and employing statistical methods such as Canonical Correlation Analysis and gated fusion, instead of the more popular single stage fusion. We applied these fusion frameworks for HAR using depth and inertial sensors. At the input of each fusion framework, we transform depth and inertial sensor data into images called Sequential Front view Images (SFI) and Signal Images (SI). The SFI and SI images are then fused through our proposed multilevel frameworks for more accurate HAR while maintaining computational speed. We evaluate the proposed frameworks on three public multimodal HAR datasets, namely, UTD Multimodal Human Action Dataset (MHAD), Berkeley MHAD, and UTDMHAD Kinect V2 and achieved accuracies of 99.3%, 99.85% and 99.8% respectively.</p> <p>While the proposed frameworks were developed with HAR as a target application area, they can be applied to other fusion problems as well. We show the generalizability of frameworks by applying them to a different domain, where ECG (1D time series) data is converted to multimodal images and fed through our fusion frameworks for arrythmia classification and stress assessment. Preliminary results in these applications are encouraging, further strengthening the significance of the proposed frameworks.</p>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.