Abstract

Deep learning algorithms provide visual tracking robustness at an unprecedented level, but realizing an acceptable performance is still challenging because of the natural continuous changes in the features of foreground and background objects over videos. One of the factors that most affects the robustness of tracking algorithms is the choice of network architecture parameters, especially the depth. A robust visual tracking model using a very deep generator (RTDG) was proposed in this study. We constructed our model on an ordinary convolutional neural network (CNN), which consists of feature extraction and binary classifier networks. We integrated a generative adversarial network (GAN) into the CNN to enhance the tracking results through an adversarial learning process performed during the training phase. We used the discriminator as a classifier and the generator as a store that produces unlabeled feature-level data with different appearances by applying masks to the extracted features. In this study, we investigated the role of increasing the number of fully connected (FC) layers in adversarial generative networks and their impact on robustness. We used a very deep FC network with 22 layers as a high-performance generator for the first time. This generator is used via adversarial learning to augment the positive samples to reduce the gap between the hungry deep learning algorithm and the available training data to achieve robust visual tracking. The experiments showed that the proposed framework performed well against state-of-the-art trackers on OTB-100, VOT2019, LaSOT and UAVDT benchmark datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call