Abstract
Action recognition in videos plays an important role in the field of computer vision and multimedia, and there exist lots of challenges due to the complexity of spatial and temporal information. Trajectory-based approach has shown to be efficient recently, and a new framework and algorithm of trajectory space information based multiple kernel learning (TSI-MKL) is exploited in this paper. First, dense trajectories are extracted as raw features, and three saliency maps are computed corresponding to color, space, and optical flow on frames at the same time. Secondly, a new method combining above saliency maps is proposed to filter the achieved trajectories, by which a set of salient trajectories only containing foreground motion regions is obtained. Afterwards, a novel two-layer clustering is developed to cluster the obtained trajectories into several semantic groups and the ultimate video representation is generated by encoding each group. Finally, representations of different semantic groups are fed into the proposed kernel function of a multiple kernel classifier. Experiments are conducted on three popular video action datasets and the results demonstrate that our presented approach performs competitively compared with the state-of-the-art.
Paper version not known (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have