Abstract

This paper study introduces a novel technique to enhance object tracking algorithms by utilizing dual tree complex wavelet transform for extracting hierarchical convolutional features from images. The method replaces the image template in the input of VGG-Net-19 with wavelet coefficients to generate hierarchical convolutional features using five layers. While the last convolutional layers provide robust semantic information of targets and are resistant to appearance variations, their spatial resolution is insufficient for precise localization. However, earlier convolutional layers offer more accurate localization but are vulnerable to appearance changes. The study considers the convolutional layer hierarchies as a nonlinear equivalent of an image pyramid representation and leverages these multiple levels of abstraction for visual tracking. The proposed method learns adaptive correlation filters on each convolutional layer to encode the target appearance and hierarchically infers the maximum response of each layer to locate targets. The study evaluates the approach with specific performance metrics and benchmark datasets, including OTB50, OTB100, TC128, and UAV20, and the results demonstrate that the proposed approach surpasses the performance of state-of-the-art tracking methods, indicating its efficacy in addressing the challenges encountered in visual object tracking.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call