Abstract

Despite its high computation costs, human activity recognition is one of the most widely researched areas in computer vision. Deep learning based Human activity recognition in the complex live video has been focused on in this work. Various works stated that end-to-end models are adequate for learning features of video inherently. However, it is a challenging method to capture effective features to recognize the activity. To alleviate this problem, we identify the proposals where the actual event is happening amongst the sequence of frames. In this paper, deep CNN and Stacked LSTM are used to select the deep features for action and maintain the long-term dependencies (i.e., previous information) of the action features, respectively. Also, we proposed Graph Convolutional Network (G-ConvNET), which encodes the features from the graph to label the activity. The proposed method is tested on the THUMOS14 dataset, which comprises twenty complex sports field activities. The experimental results show a significant improvement of 8.6% in modernized methods with mean Average Precision (mAP).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call