Visual Tracking by Adaptive Continual Meta-Learning

Sungyong Baik,Kyoung Mu Lee,Junseok Kwon,Janghoon Choi,Myungsub Choi

doi:10.1109/access.2022.3143809

Sungyong Baik, Kyoung Mu Lee + Show 3 more

Open Access

https://doi.org/10.1109/access.2022.3143809

Copy DOI

Abstract

We formulate the visual tracking problem as a semi-supervised continual learning problem, where only an initial frame is labeled. In contrast to conventional meta-learning based approaches that regard visual tracking as an instance detection problem with a focus on finding good weights for model initialization, we consider both initialization and online update processes simultaneously under our adaptive continual meta-learning framework. The proposed adaptive meta-learning strategy dynamically generates the hyperparameters needed for fast initialization and online update to achieve more robustness via adaptively regulating the learning process. In addition, our continual meta-learning approach based on knowledge distillation scheme helps the tracker adapt to new examples while retaining its knowledge on previously seen examples. We apply our proposed framework to deep learning-based tracking algorithm to obtain noticeable performance gains and competitive results against recent state-of-the-art tracking algorithms while performing at real-time speeds.

Highlights

Visual tracking, which is one of the fundamental computer vision problems, has seen practical applications in robotics, automated surveillance, and autonomous driving
With the advances in the application of deep convolutional neural networks (CNN) to image classification and object detection tasks [1]–[4], visual tracking algorithms have achieved large improvements in performance, owing to the representation power of their deep backbone networks [5], The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues
There is a misalignment between goals of object detection and visual tracking problem, where object detection aims to locate all objects of same semantic class whereas visual tracking aims to locate a specific object instance

Summary

Introduction

Visual tracking, which is one of the fundamental computer vision problems, has seen practical applications in robotics, automated surveillance, and autonomous driving. There is a misalignment between goals of object detection and visual tracking problem, where object detection aims to locate all objects of same semantic class whereas visual tracking aims to locate a specific object instance. To overcome this gap, visual tracking algorithms employ some form of domain adaptation process to the object detection framework, such as online network finetuning using stochastic gradient descent (SGD)-based methods [5], [9]–[11] or Siamese network structure [6], [7], [12]–[14] which generates a target-specific convolutional kernel from the initial frame

Objectives

Methods

Conclusion