Abstract

Abstract In video recognition, rank-pooling operators are a type of models for sorting video sequences, which act on either the raw inputs or the intermediate feature maps of convolutional neural network (CNN). However, such models are currently restricted in the optimization of the linear ranking function by Rank-SVM and Rank-SVR. In this paper, we first propose a CNN architecture called RGB Rank Pooling Dynamic Network (RGB-RPDN), mapping a video to multiple frame-level dynamic spaces with the same size as the input. Importantly, a classical classification (e.g. FC, CNN) advanced in 2D image can be jointly positioned behind the generated representation for action classification, thus the joint architecture can be trained in an end-to-end manner. Second, we analyze how the flow-level evolution can be modeled by the hand-crafted rank-pooling machine, and extend the dynamic space of frame-level to that of flow-level by the Flow Rank Pooling Dynamic Network (Flow-RPDN). Third, equivalence relations between hand-crafted rank-pooling and RPDN are formulated, further the comparison of computing cost are qualitatively analyzed. Finally, the frame-level and flow-level pipelines are combined to achieve the final prediction by the late fusion. Specifically, with the models pre-trained on the large-scale Kinetics dataset, we train the two-stream RPDN on the UCF101 and HMDB51, where the parameters are initialized by the pre-trained models above. Experimental results demonstrate that the RPDN significantly improves the hand-crafted rank-pooling machines by a large margin of promotion, and achieves the correct rate of more excellent classification in action recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.