SFD: Similar Frame Dataset for Content-Based Video Retrieval

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Content-based video retrieval aims to retrieve near-duplicate entries from a database of a given query video. It plays an important role in combating video piracy. Robustness to video temporal dynamics is crucial for a representation model in video retrieval, as frames extracted from two copied videos are hardly temporally aligned in actual situations. However, current image retrieval datasets have difficulty in evaluating this robustness. To address this issue, we collect Similar Frame Dataset (SFD), which consists of 32,923 query-target pairs with 128,240 distraction images. The task of SFD is to retrieve the target frame from all items given a query frame. SFD is constructed by sampling frames from Kinetics-700 action classification dataset. An object detection model (Faster R-CNN) and a Multimodal Large Language Model (BLIP2) are used during sampling to select those valid frames. Besides, we propose Adjacent Frames Contrastive Learning (AFCL) framework. In AFCL, adjacent frames are sampled from unlabeled videos as positive pairs. An image representation model with robustness to changing frames can be trained under AFCL framework and achieve the state-of-the-art performance on SFD. The code will be released at https://github.com/Chuan-shanjia/Similar-Frame-Dataset.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant