Scalable video query optimization has re-emerged as an attractive research topic in recent years. The OTIF system, a video database with cutting-edge efficiency, has introduced a new paradigm of utilizing view materialization to facilitate online query processing. Specifically, it stores the results of multi-object tracking queries to answer common video queries with sub-second latency. However, the cost associated with view materialization in OTIF is prohibitively high for supporting large-scale video streams. In this paper, we study efficient MOT-based view materialization in video databases. We first conduct a theoretical analysis and establish two types of optimality measures that serve as lower bounds for video frame sampling. In order to minimize the number of processed video frames, we propose a novel predictive sampling framework, namely LEAP, exhibits near-optimal sampling performance. Its efficacy relies on a data-driven motion manager that enables accurate trajectory prediction, a compact object detection model via knowledge distillation, and a robust cross-frame associator to connect moving objects in two frames with a large time gap. Extensive experiments are conducted in 7 real datasets, with 7 baselines and a comprehensive query set, including selection, aggregation and top-k queries. The results show that with comparable query accuracy to OTIF, our LEAP can reduce the number of processed video frames by up to 9× and achieve 5× speedup in query processing time. Moreover, LEAP demonstrates impressive throughput when handling large-scale video streams, as it leverages a single NVIDIA RTX 3090ti GPU to support real-time MOT-based view materialization from 160 video streams simultaneously.
Read full abstract