Self-Supervised Video-Centralised Transformer for Video Face Clustering.

Yujiang Wang,Stavros Petridis,Maja Pantic,Yiming Lin,Mingzhi Dong,Yiming Luo,Pingchuan Ma,Jie Shen

doi:10.1109/tpami.2023.3243812

Abstract

This article presents a novel method for face clustering in videos using a video-centralised transformer. Previous works often employed contrastive learning to learn frame-level representation and used average pooling to aggregate the features along the temporal dimension. This approach may not fully capture the complicated video dynamics. In addition, despite the recent progress in video-based contrastive learning, few have attempted to learn a self-supervised clustering-friendly face representation that benefits the video face clustering task. To overcome these limitations, our method employs a transformer to directly learn video-level representations that can better reflect the temporally-varying property of faces in videos, while we also propose a video-centralised self-supervised framework to train the transformer model. We also investigate face clustering in egocentric videos, a fast-emerging field that has not been studied yet in works related to face clustering. To this end, we present and release the first large-scale egocentric video face clustering dataset named EasyCom-Clustering. We evaluate our proposed method on both the widely used Big Bang Theory (BBT) dataset and the new EasyCom-Clustering dataset. Results show the performance of our video-centralised transformer has surpassed all previous state-of-the-art methods on both benchmarks, exhibiting a self-attentive understanding of face videos.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Self-Supervised Video-Centralised Transformer for Video Face Clustering.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence

Lead the way for us

Journal: IEEE transactions on pattern analysis and machine intelligence	Publication Date: Jan 1, 2023
Citations: 2

Similar Papers

Constrained Clustering and Its Application to Face Clustering in Videos
Baoyuan Wu ... Yifan Zhang
-
Baoyuan Wu, et. al.Baoyuan Wu ... Yifan Zhang
01 Jun 2013
01 Jun 2013

Constrained Multi-View Video Face Clustering.
Xiaochun Cao ... Huazhu Fu
IEEE Transactions on Image Processing | VOL. 24
Xiaochun Cao, et. al.Xiaochun Cao ... Huazhu Fu
30 Jul 2015
IEEE Transactions on Image Processing | VOL. 24

Deep Metric Learning with Improved Triplet Loss for Face Clustering in Videos
Shun Zhang ... Jinjun Wang
-
Shun Zhang, et. al.Shun Zhang ... Jinjun Wang
01 Jan 2015
01 Jan 2015

Face clustering in videos based on spectral clustering techniques
Christina Chrysouli ... Ioannis Pitas
-
Christina Chrysouli, et. al.Christina Chrysouli ... Ioannis Pitas
01 Nov 2011
01 Nov 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Supervised Video-Centralised Transformer for Video Face Clustering.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence