Abstract

A new low-level visual feature, called Spatio-temporal context distribution feature of interest points is used to describe human actions. Each action video is expressed as a set of relative XYT coordinates between interest points listed pair wise in a local region. From the input image frames the Locally Weighted Word Context (LWWC ) descriptor encodes the spatial context interest points rather than being limited to a single interest point and the Graph Regularized Nonnegative Matrix Factorization (GNMF) is used to encode the geometrical information by constructing a nearest neighbour graph. By extracting the kernel weights of the obtained feature variables , the kernel weighted SVM is modelled to jointly capture the compatibility between multilevel action features and action classes and the compatibility between multilevel scene features and scene classes. The contextual relationship between action classes and scene classes is derived using the kernel weight as a variable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call