Abstract

The explosive growth of video data has brought great challenges to video retrieval, which aims to find out related videos from a video collection. Most users are usually not interested in all the content of retrieved videos but have a more fine-grained need. In the meantime, most existing methods can only return a ranked list of retrieved videos lacking a proper way to present the video content. In this paper, we introduce a distinctively new task, namely One-Stop Video Delivery (OSVD) aiming to realize a comprehensive retrieval system with the following merits: it not only retrieves the relevant videos but also filters out irrelevant information and presents compact video content to users, given a natural language query and video collection. To solve this task, we propose an end-to-end Hierarchical Video Graph Reasoning framework (HVGR) , which considers relations of different video levels and jointly accomplishes the one-stop delivery task. Specifically, we decompose the video into three levels, namely the video-level, moment-level, and the clip-level in a coarse-to-fine manner, and apply Graph Neural Networks (GNNs) on the hierarchical graph to model the relations. Furthermore, a pairwise ranking loss named Progressively Refined Loss is proposed based on prior knowledge that there is a relative order of the similarity of query-video, query-moment, and query-clip due to the different granularity of matched information. Extensive experimental results on benchmark datasets demonstrate that the proposed method achieves superior performance compared with baseline methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call