Abstract

Human action recognition (HAR) refers to the process in which computers analyze and process video data to obtain the categories of action presented in the video. It has a wide range of applications, such as video surveillance, human–computer interaction, and autonomous driving. The spatio-temporal features required for video analysis are typically extracted from the RGB pixels, which entail significant computational complexity and make it challenging for real-time action recognition. However, in the compressed domain, sparse representations, such as motion vectors, quantization parameters, transform coefficients, and residuals provide the comparable scene semantic information with reduced complexity. This paper provides an overview of the research efforts in action recognition based on the compressed domain. It includes a comprehensive review of both traditional and deep learning-based methods published between 2000 and 2023, focusing on compression standards and compression parameters. Specifically, we first summarize the compression standards and compressed algorithms used for compressing videos. Furthermore, we classify compressed-domain action recognition methods into traditional and deep learning-based approaches. Thirdly, we introduce public datasets and evaluation metrics, analyze the characteristics of these methods and compare their performance. Finally, we highlight challenges and suggest future research directions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call