Abstract

Smart homes have become central in the sustainability of buildings. Recognizing human activity in smart homes is the key tool to achieve home automation. Recently, two-stream Convolutional Neural Networks (CNNs) have shown promising performance for video-based human action recognition. However, such models cannot act directly on the 3D skeletal sequences due to its limitation to the 2D image video inputs. Considering the powerful effect of 3D skeletal data for describing human activity, in this study, we present a novel method to recognize the skeletal human activity in sustainable smart homes using a CNN fusion model. Our proposed method can represent the spatiotemporal information of each 3D skeletal sequence into three images and three image sequences through gray value encoding, referred to as skeletal trajectory shape images (STSIs) and skeletal pose image (SPI) sequences, and build a CNNs’ fusion model with three STSIs and three SPI sequences as input for skeletal activity recognition. Such three STSIs and three SPI sequences are, respectively, generated in three orthogonal planes as complementary to each other. The proposed CNN fusion model allows the hierarchical learning of spatiotemporal features, offering better action recognition performance. Experimental results on three public datasets show that our method outperforms the state-of-the-art methods.

Highlights

  • Smart homes are the most important technology of sustainable building

  • We propose a novel method consisting of 3D skeletal sequence mapping and Convolutional Neural Networks (CNNs) fusion model for skeletal action recognition

  • (1) We propose a novel CNN fusion model built based on a two-stream architecture with three skeletal pose image (SPI) sequences and three skeletal trajectory shape images (STSIs) as input to allow the hierarchical learning of spatiotemporal features of skeletal trajectory shape and skeletal pose sequence for skeletal human action recognition

Read more

Summary

Introduction

Smart homes are the most important technology of sustainable building. In the development of the generation of smart homes, vision-based action analyzing methods are of key importance since they can make it possible for human occupants to interface with household appliances using just their physical actions instead of mouse, keyboard, touchscreen, or remote control devices. With the rapid development of the RGB-D sensors (e.g., Microsoft Kinect) and real-time full-body tracking with low cost [1, 2], 3D skeletal action analysis has drawn great attentions [3,4,5]. 3D skeletal action analysis methods are more suited to smart home technology than RGB-based methods. In the work by Du et al [10], the skeletal joints are divided into five sets corresponding to five body parts. In the Zhu et al [11], the skeletal joints are fed to a deep LSTM at each time slot to learn the inherent co-occurrence features of skeletal joints. In the study by Song et al [13], both the spatial and temporal information of skeletal sequences are learned with a spatial-temporal LSTM. RNNs tend to overemphasize the temporal information especially when the training data is insufficient, leading to overfitting [14]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call