Abstract

In this work, we study the method exploiting natural language network to improve tracking performance. We propose a novel architecture which can combine class and visual information presented in tracking. To this end, we introduce a multimodal feature association network, allowing us to correlate the target class with its appearance during training and aid the localization of the target during inference. Specifically, we first utilize an appearance model to extract the target visual features, from which we obtain appearance cues, for instance shape and color. In order to employ target class information, we design a learned lightweight embedding network to embed the target class into a feature representation. The association network of our architecture contains a multimodal fusion module and a predictor module. The fusion module is used to combine features from class and appearance, yielding multimodal features with more expressive representations for the subsequent module. The predictor module is used to determine the target location in the current frame, from which we associate the class to the appearance. The class embedding module thus can learn appearance cues by exploiting the back-propagation functionality. To verify the abilities of our method, we select the official training and test splits of the LaSOT with annotated images and classes to perform experiments. In particular, we analyze the imbalance in the samples and employ a class validator discriminator to alleviate this problem. Extensive experimental results on LaSOT, UAV20L and UAV123@10fps demonstrate our method achieves competitive results while maintaining a considerable real-time speed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.