Abstract
Accurate and fast scale estimation of targets is a challenging research problem in visual object tracking. Most trackers employ an exhaustive scale search to estimate the target size. The exhaustive search strategy is computationally expensive and it struggles when encountered with large scale variations. In this paper, a scale adaptive method is proposed, which not only improves the tracking performance, but also greatly reduces the computational costs and improves the tracking speed. Based on the scale estimation method of SAMF, the original 7 fixed scale sizes were reduced to 3, and an adaptive scale size was added. Three fixed scales were used to determine the direction of scale change, and the APCE change rate of the current frame and the previous frame was used to control another adaptive scale size. Finally, the optimal scale estimation was determined. Additionally, we investigate the update strategy to further improve the tracking accuracy. Extensive experiments on OTB50, OTB100 and VOT-ST2019 datasets demonstrate that the proposed method can tackle challenging videos well compared with baseline tracker. On OTB, we obtain a gain of 7.0% in Distance Precision, and 18.8% in Centre Location Error on the selected 43 videos with scale variation attribute, and a mean gain of 6.2% in Precision and 4.6% in Success plots on OTB50, compared with the baseline tracker SAMF. Furthermore, the proposed approach improves the tracking speed by 34% in FPS compared with SAMF.
Highlights
Visual tracking is a classical computer vision research topic which can be employed in many fields, such as behaviour analysis, surveillance, autonomous driving, robotics, etc. [1]–[3]
We find that the disadvantage of SAMF method is that its tracking speed is greatly reduced due to excessive number of search scales
We implement experiments on the OTB benchmark datasets [39]. All these sequences are annotated with 11 attributes which cover various challenging factors, including scale variation (SV), occlusion (OCC), illumination variation (IV), motion blur (MB), deformation (DEF), fast motion (FM), out-of plane rotation (OPR), background clutters (BC), out-of-view (OV), in-plane rotation (IPR) and low resolution (LR)
Summary
Visual tracking is a classical computer vision research topic which can be employed in many fields, such as behaviour analysis, surveillance, autonomous driving, robotics, etc. [1]–[3]. There are two main methods for scale estimation in CF based trackers: 1) One translation correlation filter with multiple scales approach represented by the SAMF tracker [11] (the SAMF tracker uses 7 different scales). The target estimation component is trained to predict the overlap between the target object and an estimated bounding box It achieves excellent tracking results, ATOM uses offline training in the scale estimation which can achieve real-time tracking only under GPU. Inspired by temporal regularization method STRCF [18], the model update rate is determined according to the change of model parameters This proposed scale estimation approach is generic and can be incorporated into trackers which use the scale method based on SAMF. A model update scheme which simulates temporal regularization is adopted to further enhance the proposed tracking method for dealing with more challenging scenarios
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.