Video Face Clustering With Self-Supervised Representation Learning

Vivek Sharma,Rainer Stiefelhagen,M Saquib Sarfraz,Makarand Tapaswi

doi:10.1109/tbiom.2019.2947264

Abstract

Characters are a key component of understanding the story conveyed in TV series and movies. With the rise of advanced deep face models, identifying face images may seem like a solved problem. However, as face detectors get better, clustering and identification need to be revisited to address increasing diversity in facial appearance. In this paper, we propose unsupervised methods for feature refinement with application to video face clustering. Our emphasis is on distilling the essential information, identity , from the representations obtained using deep pre-trained face networks. We propose a self-supervised Siamese network that can be trained without the need for video/track based supervision, that can also be applied to image collections. We evaluate our methods on three video face clustering datasets. Thorough experiments including generalization studies show that our methods outperform current state-of-the-art methods on all datasets. The datasets and code are available at https://github.com/vivoutlaw/SSIAM .

Full Text