Abstract

Detecting temporal actions in long and untrimmed videos is a challenging and important field in computer vision. Generating high-quality proposals is a key step in temporal action detection. A high-quality proposal usually contains two main characteristics. One is the temporal overlaps between proposals and action instances should be as large as possible. The another one is the number of generated proposals should be as few as possible. Inspired by the similarity comparison in face recognition and the similarity of action in same action segment, we design a module to compare the similarity for visual features extracted from visual feature encoder. We find out time points where the similarity of features changes shapely to generate candidate proposals. Then, we train a classifier to evaluate the candidate proposals whether contains or not contains action instances. The experiments suggest that our method outperforms other temporal action proposal generation methods in THUMOS-14 dataset and ActivityNet-v1.3 dataset. In addition, our method still outperforms other methods when using different visual features extracted from different networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call