Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning

Md Maklachur Rahman,Mustansar Fiaz,Soon Ki Jung

doi:10.1109/access.2020.2997917

Md Maklachur Rahman, Mustansar Fiaz + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2997917

Copy DOI

Abstract

Template based learning, particularly Siamese networks, has recently become popular due to balancing accuracy and speed. However, preserving tracker robustness against challenging scenarios with real-time speed is a primary concern for visual object tracking. Siamese trackers confront difficulties handling target appearance changes continually due to less discrimination ability learning between target and background information. This paper presents stacked channel-spatial attention within Siamese networks to improve tracker robustness without sacrificing fast-tracking speed. The proposed channel attention strengthens target-specific channels increasing their weight while reducing the importance of irrelevant channels with lower weights. Spatial attention is focusing on the most informative region of the target feature map. We integrate the proposed channel and spatial attention modules to enhance tracking performance with end-to-end learning. The proposed tracking framework learns what and where to highlight important target information for efficient tracking. Experimental results on widely used OTB100, OTB50, VOT2016, VOT2017/18, TC-128, and UAV123 benchmarks verified the proposed tracker achieved outstanding performance compared with state-of-the-art trackers.

Highlights

Visual object tracking is a fundamental and challenging task for a wide range of computer vision applications, including intelligent surveillance [1], autonomous vehicles [2], game analysis [3], and human-computer interface [4]
We were inspired by human visual perception, which does not require concentrating on the whole scene, but rather focuses on the specific object for perceiving informative parts to understanding the appropriate visual pattern [52]
The global max-pooling operation focuses on distinctive and finer object features, whereas global average pooling provides overall knowledge on the feature map for channel attention. After computing both pooling operations, we calculate individual multilayer perceptron (MLP) using an rectified linear unit (ReLU) layer to learn the non-linearity between two fully-connected layers with 128 and 512 nodes, respectively

Summary

INTRODUCTION

Visual object tracking is a fundamental and challenging task for a wide range of computer vision applications, including intelligent surveillance [1], autonomous vehicles [2], game analysis [3], and human-computer interface [4]. For example, DsimaM [7] learns background suppression and appearance variations from earlier frames using a fast transformation learning model; whereas DCFNet [6] integrates a discriminant correlation filter (DCF) within a lightweight architecture and drives back-propagation to adjust the DCF layer using the probability heat map of the target location. Models and results are available at https://github.com/maklachur/SCSAtt

RELATED WORK

SIAMESE NETWORK FOR FEATURE LEARNING

STACKED CHANNEL-SPATIAL ATTENTION

IMPLEMENTATION DETAILS

EXPERIMENTS

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 19	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Visual object tracking based on adaptive Siamese and motion estimation network
Hossein Kashiani ... Shahriar B Shokouhi
Image and Vision Computing | VOL. 83-84
Hossein Kashiani, et. al.Hossein Kashiani ... Shahriar B Shokouhi
21 Feb 2019
Image and Vision Computing | VOL. 83-84

Channel Attention Based Generative Network for Robust Visual Tracking
Ying Hu ... Jian Yang
-
Ying Hu, et. al.Ying Hu ... Jian Yang
01 May 2020
01 May 2020

Discriminative Siamese Tracker Based on Multi-Channel-Aware and Adaptive Hierarchical Deep Features
Huanlong Zhang ... Rui Duan
Symmetry | VOL. 13
Huanlong Zhang, et. al.Huanlong Zhang ... Rui Duan
05 Dec 2021
Symmetry | VOL. 13

Visual Object Tracking by Structure Complexity Coefficients
Yuan Yuan ... Huan Yang
IEEE Transactions on Multimedia | VOL. 17
Yuan Yuan, et. al.Yuan Yuan ... Huan Yang
01 Aug 2015
IEEE Transactions on Multimedia | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions