Abstract

A new feature-representation method for recognizing actions in broadcast videos, which focuses on the relationship between human actions and camera motions, is proposed. With this method, key point trajectories are extracted as motion features in spatio-temporal sub-regions called "spatio-temporal multiscale bags" (STMBs). Global representations and local representations from one sub-region in the STMBs are then combined to create a "glocal pair wise representation" (GPR). The GPR considers the co-occurrence of camera motions and human actions. Finally, two-stage SVM classifiers are trained with STMB-based GPRs, and specified human actions in video sequences are identified. It was experimentally confirmed that the proposed method can robustly detect specific human actions in broadcast basketball videos.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call