Towards optimal vlad for human action recognition from still images

Lei Zhang,Jiqing Han,Xiantong Zhen

doi:10.1109/icassp.2016.7471995

Abstract

Human action recognition from still image has recently drawn increasing attention in human behavior analysis vision and also poses great challenges due to the huge inter ambiguity and intra variability. Vector of locally aggregated descriptors (VLAD) has achieved state-of-the-art performance in many image classification tasks based on local features. The great success of VLAD is largely due to its high descriptive ability and computational efficiency. In this paper, towards optimal VLAD representations for human action recognition from still images, we improve VLAD by tackling two important issues in VLAD including empty cavity and assignment ambiguity. The empty cavity issue severely compromises the performance of VLAD and has long been overlooked. We investigate the empty cavity and provide an effective solution to deal with it, which largely improves the performance of VLAD; we propose middle level assignments to conquer the assignment ambiguity, which are more reliable and can provide more useful information for realistic activity. We have conducted extensive experiments on two widely-used benchmarks to validate the proposed method for human action recognition from still images. Our method produces competitive performance with state-of-the-art algorithms.

Full Text