Abstract

Face-based video retrieval (FBVR) is the task of retrieving videos that containing the same face shown in the query image. In this article, we present the first end-to-end FBVR pipeline that is able to operate on large datasets of unconstrained, multi-shot, multi-person videos. We adapt an existing audiovisual recognition dataset to the task of FBVR and use it to evaluate our proposed pipeline. We compare a number of deep learning models for shot detection, face detection, and face feature extraction as part of our pipeline on a validation dataset made of more than 4000 videos. We obtain 97.25% mean average precision on an independent test set, composed of more than 1000 videos. The pipeline is able to extract features from videos at sim 7 times the real-time speed, and it is able to perform a query on thousands of videos in less than 0.5 s.

Highlights

  • Video retrieval is the task of matching an input query with a set of videos, in order to select the videos that are relevant to the given query

  • RetinaFace obtained the best results with every face recognition network, the performance gain was much more substantial in combination with ArcFace

  • The first one is that RetinaFace produces more accurate face landmarks, used for face alignment, which we found to be fundamental to obtain a good performance with ArcFace

Read more

Summary

Introduction

Video retrieval is the task of matching an input query with a set of videos, in order to select the videos that are relevant to the given query. Retrieval tasks often deal with one of two possible types of queries: text based or content based. Text-based queries look at metadata and data annotations such as file names, descriptions, or tags in order to determine the relevant items to retrieve. This relies on accurate and complete annotations, which are time-consuming to produce and are a rare occurrence in realworld scenarios. Content-based retrieval uses instead visual features extracted from images or videos in order to perform the matching. This makes content-based retrieval more desirable (and more accurate) in situations where accurate text-based annotations are not available

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.