Abstract

Visual object tracking is of great importance in the field of computer vision. One of the main challenges is the difficulty of identifying moving targets from nearby similar distractors with a single-view image of the scene. To overcome this challenge, in this article, we acquire multiview images of the scenes by using a light-field camera. The multiview images are able to capture the 4-D structure instead of the 2-D plane of the objects but are more difficult to process. Therefore, we propose a novel representation for multiview images, i.e., the macro-epipolar plane image (macro-EPI), which highlights both spatial topological and angular information of the target and distractors. It is obtained by slicing the original multiview images into pieces and properly restacking these pieces in an ordinal manner. The resulting macro-EPI is mapped into the 2-D space; therefore, we adapt a modified autoencoder network to train a macro-EPI feature extractor. Thereafter, we design a composite framework of two-pattern convolution filters based on a discriminative correlation filter for object tracking, which successfully discriminates the target from the distractors by merging the macro-EPI features and the single-view image features. The experiments also show that our method outperforms the state-of-the-art methods in the presence of similar distractors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call