Abstract

One of the key limitations of the many existing visual tracking method is that they are built upon low-level visual features and have limited predictability power of data semantics. To effectively fill the semantic gap of visual data in visual tracking with little supervision, we propose a tracking method which constructs a robust object appearance model via learning and transferring mid-level image representations using a deep network, i.e., Network in Network (NIN). First, we design a simple yet effective method to transfer the mid-level features learned from NIN on the source tasks with large scale training data to the tracking tasks with limited training data. Then, to address the drifting problem, we simultaneously utilize the samples collected in the initial and most previous frames. Finally, a heuristic schema is used to judge whether updating the object appearance model or not. Extensive experiments show the robustness of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.