Abstract

In order to enable efficient instance search in video, compact descriptors for video segments have been proposed. They exploit the temporal redundancy within a video segment to obtain smaller descriptors, and the segment-based representation can be exploited to enable more efficient matching. In this paper we analyze the performance of different visual features when applying both lossless and lossy compression to the set of descriptors of one video segment. We consider both handcrafted and deep features, i.e., visual features obtained from training a deep convolutional neural network. We also propose optimizations to the extraction and matching procedure. Both the compression methods and the optimizations are experimentally evaluated on a large video data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call