One of the top research topics in computer vision area is visual object tracking. Main goal is to obtain the target object’s location in the first video frame of a given video sequence. The recent innovations of deep neural networks, specifically Siamese networks has significant impact on visual object tracking. In spite of high accuracy and high results in academic benchmarks, there are drawbacks in current state-of-the-art approaches in particular compute-intensive and large memory footprint that cannot satisfy the performance requirements of real-world applications. The aim of this paper is to design a new lightweight framework for resource-efficient and accurate visual object tracking. To add, a new tracker of efficiency benchmark and protocol were introduced. Efficiency is defined in terms of both energy consumption and execution speed on edge devices. New dual template representation for object model adaptation was developed. The first template, static, fixes the original visible appearance and thus prevents deviation and, as a result, failures caused by adaptation. The other is dynamic; the state reflects the current conditions of assembly and the appearance of its object. Unlike STARK, which incorporates additional timing information by introducing a separate estimation prediction head, we introduce parameter-free module similarity as a template update rule optimized from the latest network. We show that the learned convex combination of two patterns is effective for tracking on multiple tests. A lightweight tracker was proposed, which includes functions, dual representation of patterns and pixel-by-pixel merged blocks in its compact network. The resulting FEAR-XS tracker runs at 205 FPS on the iPhone 11, which is 4.2 times faster than LightTrack and 26.6 times faster than Ocean, with high accuracy on many tests – no state-of-the-art tracker is more accurate and faster than any FEAR tracker. In addition, the algorithm is highly energy efficient.
Read full abstract