Abstract

Siamese networks have been extensively studied in recent years. Most of the previous research focuses on improving accuracy, while merely a few recognize the necessity of reducing parameter redundancy and computation load. Even less work has been done to optimize the runtime memory cost when designing networks, making the Siamese-network-based tracker difficult to deploy on edge devices. In this paper, we present SiamMixer, a lightweight and hardware-friendly visual object-tracking network. It uses patch-by-patch inference to reduce memory use in shallow layers, where each small image region is processed individually. It merges and globally encodes feature maps in deep layers to enhance accuracy. Benefiting from these techniques, SiamMixer demonstrates a comparable accuracy to other large trackers with only 286 kB parameters and 196 kB extra memory use for feature maps. Additionally, we verify the impact of various activation functions and replace all activation functions with ReLU in SiamMixer. This reduces the cost when deploying on mobile devices.

Highlights

  • Accepted: 14 February 2022Visual object-tracking is a fundamental problem in computer vision, whose goal is to locate the target in subsequent video frames based on its position in the initial frame.Visual object-tracking plays an essential role in many fields such as surveillance, machine vision, and human–computer interaction [1].Discriminative Correlation Filters (DCFs) and Siamese networks are the dominant tracking algorithm models presently

  • We propose a novel lightweight and hardware-friendly visual object-tracking model based on the Siamese tracking scheme, namely SiamMixer

  • We propose to build lightweight target-tracking algorithms by constructing lightweight backbone networks, namely SiamMixer

Read more

Summary

Introduction

Visual object-tracking is a fundamental problem in computer vision, whose goal is to locate the target in subsequent video frames based on its position in the initial frame. The Siamese network tracker treats visual target tracking as a similarity learning problem. The Siamese network tracker eliminates the need for complex descriptor design, uses large amounts of labeled data for training, and learns to distinguish targets from the background. Thanks to the learning and generalizing ability of neural networks, the Siamese network tracker can track targets that do not appear in the training set. DIMP [8] use online template updates in the Siamese network, achieving state-of-theart performance These methods can improve tracking accuracy and robustness, they ignore computational overhead and memory footprint, limiting their applications in mobile devices.

Trackers Based on Siamese Network
Lightweight Network Structure Design
Proposed Algorithm
Convolutional Layer
Mixing Module
Target Locating
Training Setup
Datasets and Evaluation Metrics
Experiment Results
Ablation Analysis
Storage and Analysis
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call