Talking pictures: Temporal grouping and dialog-supervised person recognition

Timothee Cour,Akash Nagle,Ben Taskar,Benjamin Sapp

doi:10.1109/cvpr.2010.5540106

Abstract

We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of ‘supervision’ are the dialog cues: first, second and third person references (such as “I'm Jack”, “Hey, Jack!” and “Jack left”). While this kind of supervision is sparse and indirect, we exploit multiple modalities and their interactions (appearance, dialog, mouth movement, synchrony, continuity-editing cues) to effectively resolve identities through local temporal grouping followed by global weakly supervised recognition. We propose a novel temporal grouping model that partitions face tracks across multiple shots while respecting appearance, geometric and film-editing cues and constraints. In this model, states represent partitions of the k most recent face tracks, and transitions represent compatibility of consecutive partitions. We present dynamic programming inference and discriminative learning for the model. The individual face tracks are subsequently assigned a name by learning a classifier from partial label constraints. The weakly supervised classifier incorporates multiple-instance constraints from dialog cues as well as soft grouping constraints from our temporal grouping. We evaluate both the temporal grouping and final character naming on several hours of TV and movies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Talking pictures: Temporal grouping and dialog-supervised person recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Detecting person presence in TV shows with linguistic and structural features
Frederic Bechet ... Benoit Favre
-
Frederic Bechet, et. al.Frederic Bechet ... Benoit Favre
01 Mar 2012
01 Mar 2012

Television and music video exposure and adolescent 'alcopop' use
Jan Van Den Bulck ... Kathleen Beullens
International Journal of Adolescent Medicine and Health | VOL. 18
Jan Van Den Bulck, et. al.Jan Van Den Bulck ... Kathleen Beullens
01 Jan 2006
International Journal of Adolescent Medicine and Health | VOL. 18

Tracking and recognition face in videos with incremental local sparse representation model
Chao Wang ... Yunhong Wang
Optical Engineering | VOL. 52
Chao Wang, et. al.Chao Wang ... Yunhong Wang
21 Oct 2013
Optical Engineering | VOL. 52

Face Tracking and Recognition in Video
Rama Chellappa ... Ming Du
-
Rama Chellappa, et. al.Rama Chellappa ... Ming Du
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Talking pictures: Temporal grouping and dialog-supervised person recognition

Abstract

Talk to us

Similar Papers