With the influx of online video uploads, a robust method for detecting video copies under various distortions is essential for copyright protection and search efficiency. We propose a compact video representation that uses a Siamese Neural Network for near-duplicate detection. This approach achieves up to 0.998 recall and 0.853 precision, even with distorted 56x56px miniatures. Consequently, we derive extremely compact video descriptors, facilitating video fragment retrieval in large datasets. Our method pre-selects frames from a video by identifying local maximums from the interframe differences curve, preserving characteristic sequence patterns even after extreme compression. Utilizing a simpler convolution-based model in the Siamese Neural Network, we improve results by up to 8% over VGG-16, with a 1.7-fold reduction in inference time. With compact descriptors of 1.875 kB, our models outperform others by more than two times. Additionally, we introduce a new dataset with dynamically changing scenes, enhancing its suitability for near-duplicate video detection.
Read full abstract