Abstract
The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself.
Highlights
In the twenty-first century, digital cameras decorate almost every corner in city centers and most pedestrians carry a smartphone capable of high quality video
This paper focuses on the Video Browser Showdown 2021, a virtual event where a record number of participating teams tried to solve a large number of ad-hoc video search (AVS) and known-item search (KIS) tasks with their interactive video search systems
The availability of AVS data is one of the reasons we focus on AVS tasks; KIS tasks are analyzed in depth in previous papers [37,54]
Summary
In the twenty-first century, digital cameras decorate almost every corner in city centers and most pedestrians carry a smartphone capable of high quality video. Many commercial search engines have been established, allowing users to satisfy certain search needs over video collections with sufficient retrieval precision. These search engines focus on returning matches to free-form text queries. High retrieval recall and interactive retrieval remain difficult challenges for current video search models. TRECVID [38], Video Browser Showdown (VBS) [37] and Lifelog Search Challenge [13] define retrieval tasks where both high recall and precision are essential to achieve a good score. This paper focuses on the Video Browser Showdown 2021, a virtual event (see Fig. 1) where a record number of participating teams tried to solve a large number of AVS and KIS tasks with their interactive video search systems. The remainder of this paper is structured as follows: Sect. 2 gives an overview of VBS 2021 and its tasks, Sect. 3 introduces the participating systems and summarizes their approaches, Sect. 4 shows the results of the interactive evaluation with a particular focus on AVS analysis, and Sect. 5 gives an outlook toward the future and concludes the paper
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have