Abstract

In this paper, a new kind of Fisher Vector (FV) model, named Scale FV (ScaleFV), is proposed to ameliorate visual feature encoding for human action recognition. Although several researches have been proposed for feature encoding, the temporal scale information is almost ignored. Similar to the spatial scale information which has shown to be important in extracting and encoding visual features, the temporal scale information also plays an important role in video content analysis based on our investigation. To demonstrate this, a definition of temporal scale in videos is given, and it is presented that both of the spatial and temporal scale information can be encoded into the FV model by slightly modifying the underlying Gaussian Mixture Models (GMM). Furthermore, an enhanced FV model termed as Combined FV (CombFV) is designed to capture both position and scale information for human action recognition. Comparative experiments are carried out to demonstrate the superior performance of the proposed methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.