Abstract
Classification of human actions is an ongoing research problem in computer vision. This review is aimed to scope current literature on data fusion and action recognition techniques and to identify gaps and future research direction. Success in producing cost-effective and portable vision-based sensors has dramatically increased the number and size of datasets. The increase in the number of action recognition datasets intersects with advances in deep learning architectures and computational support, both of which offer significant research opportunities. Naturally, each action-data modality—such as RGB, depth, skeleton, and infrared (IR)—has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition. In this paper, we focus solely on data fusion and recognition techniques in the context of vision with an RGB-D perspective. We conclude by discussing research challenges, emerging trends, and possible future research directions.
Highlights
Human action recognition (HAR) has recently gained increasing attention from computer vision researchers with applications in robot vision, multimedia content search, video surveillance, and motion tracking systems
The following subsections discuss the fundamental variants of neural networks, and later we present some modern deep learning-based approaches used in RGB-D data
As performance demand relies on high-end hardware and multiple graphical processing units (GPU), support is a must when experimenting with big data-related problems
Summary
Human action recognition (HAR) has recently gained increasing attention from computer vision researchers with applications in robot vision, multimedia content search, video surveillance, and motion tracking systems. The development of low-cost sensors such as Microsoft Kinect [1], Intel RealSense [2], and Orbbec [3] has sparked further research into action recognition These sensors collect data in various modalities such as RGB video, depth, skeleton, and IR. All these modalities have their own characteristics that can help answer challenges related to action data and provide potential opportunities for computer vision researchers to examine vision data from different perspectives. RGB-D data acquisition and different consumer preferred sensors will be discussed in following subsections
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have