Abstract

Human action recognition and video summarization represent challenging tasks for several computer vision applications including video surveillance, criminal investigations, and sports applications. For long videos, it is difficult to search within a video for a specific action and/or person. Usually, human action recognition approaches presented in the literature deal with videos that contain only a single person, and they are able to recognize his action. This paper proposes an effective approach to multiple human action detection, recognition, and summarization. The multiple action detection extracts human bodies’ silhouette, then generates a specific sequence for each one of them using motion detection and tracking method. Each of the extracted sequences is then divided into shots that represent homogeneous actions in the sequence using the similarity between each pair frames. Using the histogram of the oriented gradient (HOG) of the Temporal Difference Map (TDMap) of the frames of each shot, we recognize the action by performing a comparison between the generated HOG and the existed HOGs in the training phase which represents all the HOGs of many actions using a set of videos for training. Also, using the TDMap images we recognize the action using a proposed CNN model. Action summarization is performed for each detected person. The efficiency of the proposed approach is shown through the obtained results for mainly multi-action detection and recognition.

Highlights

  • Video technologies are facing several challenges and difficulties, mainly attributed to the extraction of information in real-time from a large number

  • We propose a new approach for a combined multiple human action recognition and summarization technique

  • For evaluating the performance of the multiple action recognition approach, Weizmann and UCF-ARG human action recognition datasets utilized to train the proposed approach, when the proposed dataset as well as PET09 dataset are used for testing

Read more

Summary

Introduction

Video technologies are facing several challenges and difficulties, mainly attributed to the extraction of information in real-time from a large number. Since the scenes can change, the cameras can move, the summarization, in this case, is carried out by determining the video sequences (shots) that represent the same scenes [8,9,10,11,12,13,14,15]. This allows the keyframe to be selected using extracted features and appropriate clustering methods. The complexity and variety of scenes in a video making the foundation of generic methods impossible [40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.