Abstract

Human daily activity recognition has been a hot spot in the field of computer vision for many decades. Despite best efforts, activity recognition in naturally uncontrolled settings remains a challenging problem. Recently, by being able to perceive depth and visual cues simultaneously, RGB-D cameras greatly boost the performance of activity recognition. However, due to some practical difficulties, the publicly available RGB-D data sets are not sufficiently large for benchmarking when considering the diversity of their activities, subjects, and background. This severely affects the applicability of complicated learning-based recognition approaches. To address the issue, this article provides a large-scale RGB-D activity data set by merging five public RGB-D data sets that differ from each other on many aspects such as length of actions, nationality of subjects, or camera angles. This data set comprises 4528 samples depicting 7 action categories (up to 46 subcategories) performed by 74 subjects. To verify the challengeness of the data set, three feature representation methods are evaluated, which are depth motion maps, spatiotemporal depth cuboid similarity feature, and curvature space scale. Results show that the merged large-scale data set is more realistic and challenging and therefore more suitable for benchmarking.

Highlights

  • Human daily activity recognition via a low-cost vision system is essential for providing appropriate health care to elderly people,[1,2] or patients in early-stage Alzheimer’s disease.[3]

  • The number of K-means clusters is critical to the Bag of Words (BoW) model

  • We integrate five public RGB-D data sets to build a large-scale RGB-D activity data set for human daily activity recognition on the big data

Read more

Summary

Introduction

Human daily activity recognition via a low-cost vision system is essential for providing appropriate health care to elderly people,[1,2] or patients in early-stage Alzheimer’s disease.[3] Human daily activity analysis is critical for developing intelligent surveillance systems. Various feature extraction methods have been proposed. Chu et al.[5] proposed the block multibilateral two-dimensional linear discriminant analysis to extract features from contour of a moving object. Oreifej and Liu[6] proposed histogram of oriented 4D normals to describe depth sequences for activity recognition. It represents the depth sequence using a histogram that captures the distribution of the surface normal orientation in the 4D space

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call