Toward Generating Human-Centered Video Annotations

Aniqa Dilawari,Zahoor Ur Rehman,Irfan Mehmood,Seungmin Rho,M Usman Ghani Khan,Khalid Mahmood Awan

doi:10.1007/s00034-019-01143-9

Abstract

In the past few decades, research has been carried out to automatically find humans in a video sequence. Automatically detecting humans in videos is gaining interest for numerous applications such as driver assistance system, security, people counting, human gait characterization, video annotations, retrieval, or crowd flow analysis. Manual annotation of a video is a time-consuming task that involves human annotators which varying biases. In this paper, we have presented three computer vision algorithms (contour-based, HOG-based and SURF-based) and proposed a deep learning technique that automatically extracts spatiotemporal annotations of human and represents it by a bounding box. We have performed experiments and the accuracy obtained for each method is 86%, 92.5%, 94%, and 95.5%, respectively. Results show that not only annotation accuracy has increased but the human effort has reduced with respect to manual annotations. We have also introduced a new dataset ASSVS_KICS which is captured through a high-quality stationary camera and contain scenarios based on our community for video surveillance research.

Full Text