Abstract

Given the huge amount of aerial surveillance video, captured daily, an automated video understanding system is needed to extract information and to generate metadata that is easy to search, browse and summarize, and which can be readily understood by an end user. In this paper, we propose a Video-To-Text engine called ViTex that automatically generates text descriptions of the content of a video. The ViTex engine first segments an input video sequence according to pre-defined semantic classes using a Mixture-of- Expert blob segmentation algorithm. The resulting segmentation is coerced into a semantic concept graph and based on domain knowledge and a semantic concept hierarchy. Then, the initial semantic concept graph is summarized and pruned. Finally, according to the summarized semantic concept graph and its changes over time, text descriptions are automatically generated using one of the three description schemes: key-frame, key-object and key-change descriptions. We have applied the ViTex engine to aerial surveillance video and compared its performance with ground-truth text descriptions generated by humans.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call