Abstract

In the past few decades, research has been carried out to automatically find humans in a video sequence. Automatically detecting humans in videos is gaining interest for numerous applications such as driver assistance system, security, people counting, human gait characterization, video annotations, retrieval, or crowd flow analysis. Manual annotation of a video is a time-consuming task that involves human annotators which varying biases. In this paper, we have presented three computer vision algorithms (contour-based, HOG-based and SURF-based) and proposed a deep learning technique that automatically extracts spatiotemporal annotations of human and represents it by a bounding box. We have performed experiments and the accuracy obtained for each method is 86%, 92.5%, 94%, and 95.5%, respectively. Results show that not only annotation accuracy has increased but the human effort has reduced with respect to manual annotations. We have also introduced a new dataset ASSVS_KICS which is captured through a high-quality stationary camera and contain scenarios based on our community for video surveillance research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call