Abstract
Detecting temporal actions in long and untrimmed videos is a challenging and important field in computer vision. Generating high-quality proposals is a key step in temporal action detection. A high-quality proposal usually contains two main characteristics. One is the temporal overlaps between proposals and action instances should be as large as possible. The another one is the number of generated proposals should be as few as possible. Inspired by the similarity comparison in face recognition and the similarity of action in same action segment, we design a module to compare the similarity for visual features extracted from visual feature encoder. We find out time points where the similarity of features changes shapely to generate candidate proposals. Then, we train a classifier to evaluate the candidate proposals whether contains or not contains action instances. The experiments suggest that our method outperforms other temporal action proposal generation methods in THUMOS-14 dataset and ActivityNet-v1.3 dataset. In addition, our method still outperforms other methods when using different visual features extracted from different networks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.