Abstract

Data augmentation is critical for deep learning-based human activity recognition (HAR) systems. However, conventional data augmentation methods, such as random-cropping, may generate bad samples that are unrelated to a particular activity (e.g. the background patches without saliency motion information). As a result, the random-cropping based data augmentation may affect negatively the overall performance of HAR systems. Humans, in turn, tend to pay more attention to motion information when recognizing activities. In this work, we attempt to enhance the motion information in HAR systems and mitigate the influence of bad samples through a Siamese architecture, termed as Motion-patch-based Siamese Convolutional Neural Network (MSCNN). The term motion patch is defined as a specific square region that includes critical motion information in the video. We propose a simple yet effective method for selecting those regions. To evaluate the proposed MSCNN, we conduct a number of experiments on the popular datasets UCF-101 and HMDB-51. The mathematical model and experimental results show that the proposed architecture is capable of enhancing the motion information and achieves comparable performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call