Abstract

Temporal action detection is an important research topic in computer vision, of which Temporal Action Proposal (TAP) generation is a key step for finding candidate action segments. Our paper provides an action proposal generation network for temporally untrimmed videos in which a new effective and efficient deep architecture named action keyframe connection network for temporal action proposal Generation. Firstly, a two-stream network is adopted to extract frame-level features which inclued appearance feature and optical flow feature. The temporal information helps the subsequent network to determine whether a frame is the beginning or the ending of the action. Secondly, a position discrimination network is designed to infer the probability of each frame being starting frame or ending frame. The network outputs a starting probability sequence and an ending probability sequence which indicates the start of the action and the end of the action respectively. Finally, our network generates a proposal by a specific threshold rule combining the points in the starting probability sequence and the ending probability sequence. We carry out experiments on ActivityNet dataset to compare our proposed method with the state-of-the-art methods. Experiment results show that our method achieves superior performance over other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call