Robust appearance feature learning using pixel‐wise discrimination for visual tracking

Minji Kim,Sungchan Kim

doi:10.4218/etrij.2018-0486

Abstract

Considering the high dimensions of video sequences, it is often challenging to acquire a sufficient dataset to train the tracking models. From this perspective, we propose to revisit the idea of hand‐crafted feature learning to avoid such a requirement from a dataset. The proposed tracking approach is composed of two phases, detection and tracking, according to how severely the appearance of a target changes. The detection phase addresses severe and rapid variations by learning a new appearance model that classifies the pixels into foreground (or target) and background. We further combine the raw pixel features of the color intensity and spatial location with convolutional feature activations for robust target representation. The tracking phase tracks a target by searching for frame regions where the best pixel‐level agreement to the model learned from the detection phase is achieved. Our two‐phase approach results in efficient and accurate tracking, outperforming recent methods in various challenging cases of target appearance changes.

Full Text