Abstract

In this paper, we propose a compact and low-complexity binary feature descriptor for video analytics. Our binary descriptor encodes the motion information of a spatio-temporal support region into a low-dimensional binary string. The descriptor is based on a binning strategy and a construction that binarizes separately the horizontal and vertical motion components of the spatio-temporal support region. We pair our descriptor with a novel Fisher Vector (FV) scheme for binary data to project a set of binary features into a fixed length vector in order to evaluate the similarity between feature sets. We test the effectiveness of our binary feature descriptor with FVs for action recognition, which is one of the most challenging tasks in computer vision, as well as gait recognition and animal behavior clustering. Several experiments on the KTH, UCF50, UCF101, CASIA-B, and TIGdog datasets show that the proposed binary feature descriptor outperforms the state-of-the-art feature descriptors in terms of computational time and memory and storage requirements. When paired with FVs, the proposed feature descriptor attains a very competitive performance, outperforming several state-of-the-art feature descriptors and some methods based on convolutional neural networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call