Abstract
Temporal Action localization is a more challenging vision task than action recognition because videos to be analyzed are usually untrimmed and contain multiple action instances. In this paper, we investigate the potential of recurrent neural network, toward three critical aspects for solving this problem, namely, high-performance feature, high-quality temporal segments and effective recurrent neural network architecture. First of all, we introduce the two-stream (spatial and temporal) network for feature extraction. Then, we propose a novel temporal selective search method to generate temporal segments with variable lengths. Finally, we design a two-branch LSTM architecture for category prediction and confidence score computation. Our proposed approach to action localization, along with the key components, say, segments generation and classification architecture, are evaluated on the THUMOS'14 dataset and achieve promising performance by comparing with other state-of-the-art methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.