Abstract

AbstractThe problem of human activity recognition (HAR) has been increasingly attracting the efforts of the research community, having several applications. In this paper we propose a multi-modal approach addressing the task of video-based HAR. Our approach uses three modalities, i.e., raw RGB video data, depth sequences and 3D skeletal motion data. The latter are transformed into a 2D image representation into the spectral domain. In order to extract spatio-temporal features from the available data, we propose a novel hybrid deep neural network architecture that combines a Convolutional Neural Network (CNN) and a Long-Short Term Memory (LSTM) network. We focus on the tasks of recognition of activities of daily living (ADLs) and medical conditions and we evaluate our approach using two challenging datasets.KeywordsHuman activity recognitionConvolutional neural networksLong short term memory networksMultimodal analysis

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call