Abstract

Attributes now play a vital role for characterizing a crowded scene. Compared to low-level visual features, processing informed by attributes can capture rich semantic information. However, to effectively assign attributes to a crowded scene still remains a challenging task. In this letter, inspired by a recently proposed zero-shot learning framework, a novel attribute assignment method that maps low-level features to predefined attributes is proposed. In particular, we propose to exploit the attribute dependency during the phase of attribute assignment, which can be regarded as our main contribution. In addition, to further enhance the performance, an effective low-level feature extraction mechanism is also proposed. More precisely, appearance and motion features are first simultaneously extracted from several sampled video frames and corresponding optical flow fields via deep convolutional neural network and then, respectively, aggregated by using Fisher vector encoding to form the low-level representation of crowded scenes. Experimental results on the challenging WWW dataset demonstrate that both the proposed attribute assignment method and the low-level feature extraction mechanism outperform the state of the art.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.