Abstract
Recently Siamese-based trackers have shown their outstanding performance in visual object tracking community. But they seldom pay attention to the inter-branch interaction as well as intra-branch feature fusion from different convolution layers. In this paper, we build up a comprehensive Siamese network which consists of a mutual learning subnetwork (M-net) and a feature fusion subnetwork (F-net), to realize object tracking. Each of them is a Siamese network with special functions. M-net is designed to help the two branches mine the dependencies from each other, thus the object template is adaptively updated to a certain extent. F-net fuses different levels of convolutional features for full usage of spatial and semantic information. We also design a global-local channel attention (GLCA) module in F-net to capture the channel dependencies for a proper feature fusion. Our method takes ResNet as feature extractor and is trained offline in an end-to-end style. We evaluate our method in several famous benchmarks such as OTB2013, OTB2015, VOT2015, VOT2016, NFS and TC128. Extensive experimental results demonstrate our method achieves competitive results while maintaining a considerable real-time speed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.