Abstract

In action recognition, many different network structures of spatiotemporal features extractions has been proposed, and performed well on several mainstream datasets. Inevitably, one question occurs to us: is there any transferable characterizes between different models? In this paper, we discuss such problem by introducing a cross-architecture transferring learning scheme, dubbed soft transferring learning, aiming to overcome the limitation of divergence of different network structures. A multi-stage semi-supervision training procedure is conducted to keep the consistency internally between two different models from bottom to top. To this end, we introduce two kinds of cross-structure metric strategy to compute the mismatch value in different features level, together with entropy classification loss, integrated to be a three-stage supervision method. We additionally design a new network to learn from the supervisors, which have been trained on large-scale datasets. We fine-tune supervision model and train our new model on UCF101 and HMDB51 datasets, experiment results demonstrate the feasibility of soft transferring method, extend transfer learning to a broader sense, and show the flexibility of deploying of existing models. Our method is designed easily generalized to different networks in other computer vision task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.