Abstract

One of the driving forces of behavior recognition in video is the analysis of surveillance video. In this video, humans are monitored and their actions are classified as being normal or a deviation from the norm. Local spatio-temporal features have gained attention to be an effective descriptor for action recognition in video. The problem of using texture as local descriptor is relatively unexplored. In this paper, a work on human action recognition in video is presented by proposing a fusion of appearance, motion and texture as local descriptor for the bag-of-feature model. Rigorous experiments was conducted on the recorded UTP dataset using the proposed descriptor. The average accuracy obtained was 85.92% for the fused descriptor as compared to 75.06% for the combination of shape and motion descriptor. The result shows an improved performance for the proposed descriptor over the combination of appearance and motion as local descriptor of an interest point.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call