Abstract

Recent advancements in the field of visual tracking have been propelled by the amalgamation of Siamese networks and region proposal networks, which have demonstrated excellent competitive accuracy while remaining computationally efficient. However, these approaches often suffer from excessive parametric redundancy and additional computational costs owing to the use of anchor boxes or multiscale pyramids, leaving room for improvement in their performance. In this study, a novel extreme point graph-guided tracking approach called SiamEXTR is presented, which tracks generic target objects by detecting five keypoints, including four target-specific extreme points (i.e., the topmost, bottommost, leftmost, and rightmost points) and a center point, without requiring region classification or bounding box regression. To enhance the robustness of the extreme point detection, a new oriented pooling strategy called extreme-pooling is proposed, which captures more recognizable discriminative global and local information to help a pixel predict its category. In addition, a U-shaped backbone network is designed to preserve fine-grained visual details and stronger semantic information at high-resolution, ensuring that the detection granularity of the extreme point graph is as close to the subpixel-level as possible. Based on the detected extreme point graph, the proposed approach not only generates axis-aligned bounding boxes for object annotation, but also provides more accurate octagonal object segmentation masks through a simple approximation strategy. Without bells and whistles, extensive experiments and comparisons on several authoritative large-scale benchmark datasets demonstrated that the SiamEXTR tracker consistently achieved competitive performance, with running speeds significantly exceeding 140 frames per second. The authors hope that the concept behind this approach will serve as a new baseline and promote further development of the visual tracking community.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call