Abstract
The target and background will change continuously in the long-term tracking process, which brings great challenges to the accurate prediction of targets. The correlation filter algorithm based on manual features is difficult to meet the actual needs due to its limited feature representation ability. Thus, to improve the tracking performance and robustness, an improved hierarchical convolutional features model is proposed into a correlation filter framework for visual object tracking. First, the objective function is designed by lasso regression modeling, and a sparse, time-series low-rank filter is learned to increase the interpretability of the model. Second, the features of the last layer and the second pool layer of the convolutional neural network are extracted to realize the target position prediction from coarse to fine. In addition, using the filters learned from the first frame and the current frame to calculate the response maps, respectively, the target position is obtained by finding the maximum response value in the response map. The filter model is updated only when these two maximum responses meet the threshold condition. The proposed tracker is evaluated by simulation analysis on TC-128/OTB2015 benchmarks including more than 100 video sequences. Extensive experiments demonstrate that the proposed tracker achieves competitive performance against state-of-the-art trackers. The distance precision rate and overlap success rate of the proposed algorithm on OTB2015 are 0.829 and 0.695, respectively. The proposed algorithm effectively solves the long-term object tracking problem in complex scenes.
Highlights
Visual object tracking [1,2,3,4] is one of the most fundamental and challenging research problems in the computer vision area, which combines advanced technologies in several fields such as image processing, pattern recognition, and computer applications. e essence of video moving target tracking is to analyze and research the captured image sequence through image processing technology
When the target scale continues to increase, the convolution calculation for extracting target features and training filters would increase, which would lead to a decrease in target tracking speed. e kernel correlation filter algorithm (KCF) [11] was a further improvement of the circulant structure of tracking-bydetection with kernels (CSK) algorithm which used the histogram of oriented gradients (HOGs) to track the target and improved the accuracy of tracking. e HOG features were extracted to detect the object, improving the accuracy of tracking
To evaluate the tracking performance, the one-pass evaluation (OPE) is used as the evaluation index on the OTB2015 [31] dataset. e OPE strategy has two evaluation criteria, namely, distance precision rate (DPR) and overlap success rate (OSR). e distance precision rate represents the percentage of the center location errors between predicted position and ground-truth with different thresholds. e center position error refers to the Euclidean distance between the estimated obtained by iteration and the true positionio, wnh(ixch′, cya′n) be calculated using formula:
Summary
Visual object tracking [1,2,3,4] is one of the most fundamental and challenging research problems in the computer vision area, which combines advanced technologies in several fields such as image processing, pattern recognition, and computer applications. e essence of video moving target tracking is to analyze and research the captured image sequence through image processing technology. E essence of video moving target tracking is to analyze and research the captured image sequence through image processing technology. The features such as the overall or partial edge, texture, shape, contrast, and brightness information of the specific target are extracted and analyzed. When the target scale continues to increase, the convolution calculation for extracting target features and training filters would increase, which would lead to a decrease in target tracking speed. To pay more attention emphasis on the target sample than on the background samples, Yuan et al [18] designed a target-focusing convolutional regression model for the visual object tracking task. To deal with the shortcomings of one single feature to represent the target, some tracking methods based on multiple feature fusion were designed [20,21,22,23], which could improve the robustness of the algorithm to a certain extent
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.