Despite the remarkable empirical success for UAV object tracking, current convolutional networks usually have three unavoidable limitations: (1) The feature maps produced by convolutional layers are difficult to interpret. (2) The network needs to be trained offline on a large-scale auxiliary training set, resulting in the feature extraction ability of the trained network depending on the categories of the training set. (3) The performance of networks suffers from sensitivity to hyper-parameters (such as learning rate and weight decay) when the network needs online fine-tuning. To overcome the three limitations, this paper proposes a Discriminative Sparse Convolutional Network (DSCN) that exhibits good layer-wise interpretability and can be trained online without requiring any auxiliary training data. By imposing sparsity constraints on the convolutional kernels, DSCN furnishes the convolution layer with an explicit data meaning, thus enhancing the interpretability of the feature maps. These convolutional kernels are directly learned online from image blocks, which eliminates the offline training process on auxiliary data sets. Moreover, a simple yet effective online tuning method with few hyper-parameters is proposed to fine-tune fully connected layers online. We have successfully applied DSCN to UAV object tracking and conducted extensive experiments on six mainstream UAV datasets. The experimental results demonstrate that our method performs favorably against several state-of-the-art tracking algorithms in terms of tracking accuracy and robustness.
Read full abstract