Abstract

Recognizing normal and anomalous events in long and complex videos with multiple sub-activities has received considerable attention in recent years. This task is more challenging than traditional action recognition in short and relatively homogeneous video clips. Other than the difficulty in recognizing activities in long videos, one other challenge is the varying activity rhythms. The rhythm of sub-actions in an activity can differ in nature and can pose additional challenges that affect the performance of activity recognition methods. In this article, five video activity recognition methods were evaluated using two publicly available video datasets, Breakfast and VIRAT, which consist of long and complex videos. Extensive experiments and analyses showed that among these methods, VideoGraph, was found to perform distinctly better than the other investigated methods while maintaining high accuracy even if the test videos were exposed to severe rhythm changes. The results indicated that VideoGraph is less sensitive to varying rhythms in contrast to other investigated methods. By changing some of the architecture parameters, we also observed performance improvements in VideoGraph.

Highlights

  • There is an emerging interest in automating human activity recognition using intelligent systems

  • The contributions of this paper are as follows: We provided a comprehensive evaluation of five video activity recognition methods using two highly challenging activity recognition datasets with long and complex videos

  • In addition to applying the investigated methods to the videos with the original rhythm (R0), we demonstrated the impact of varying rhythm via three other rhythms (R1, R2 and R3) [33]

Read more

Summary

Introduction

There is an emerging interest in automating human activity recognition using intelligent systems. Recognizing activities in videos has received significant attention in recent years. The works in this emerging field mostly consist of recognizing human actions using datasets like UCF101 [13], KTH [14], HMDB51 [15], Kinetics [16]. These datasets consist of relatively short and homogeneous video clips, which are generally well-segmented and contain only one action event in which human actions take few seconds to unfold [17]. It is highly likely that some of these methods using datasets which consist of only short homogenous video clips could face challenges when it comes to recognizing normal and anomalous events in datasets that consist of long and complex videos with multiple sub-actions in it such as Breakfast [20] and VIRAT [21]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call