LSTM guided ensemble correlation filter tracking with appearance model pool

Monika Jain,Subramanyam A.V,Simon Denman,Sridha Sridharan,Clinton Fookes

doi:10.1016/j.cviu.2020.102935

Abstract

Deep learning based visual trackers have the potential to provide good performance for object tracking. Most of them use hierarchical features learned from multiple layers of a deep network. However, issues related to deterministic aggregation of these features from various layers, difficulties in estimating variations in scale or rotation of the object being tracked, as well as challenges in effectively modeling the object’s appearance over long time periods leaves substantial scope to improve performance. In this paper, we propose a tracker that learns correlation filters over features from multiple layers of a VGG network. A correlation filter for an individual layer is used to predict the target location. We adaptively learn the contribution of an ensemble of correlation filters for the final location estimation using an LSTM. An adaptive approach is advantageous as different layers encode diverse feature representations and a uniform contribution would not fully exploit this contrastive information. To this end, we use an LSTM as it encodes the interactions for past appearances which is useful for tracking. Further, the scale and rotation parameters are estimated using respective correlation filters. Additionally, an appearance model pool is used that prevents the correlation filter from drifting. Experimental results achieved on five public datasets — Object Tracking Benchmark (OTB100), Visual Object Tracking (VOT) Benchmark 2016, VOT Benchmark 2017, Tracking Dataset and UAV123 Dataset, reveal that our approach outperforms state of the art approaches for object tracking.

Full Text