Context-Oriented Name-Face Association in Web Videos

Zhineng Chen,Wei Zhang,Xiaoyan Gu,Hongtao Xie,Bailan Feng

doi:10.1007/978-3-319-48896-7_62

Abstract

Automatically linking faces in Web videos with their names scattered in the surrounding text (e.g., the user generated title and tags) is an important task for many applications. Traditionally, this task is accomplished either by jointly exploring visual-textual consistency under constraints, or by leveraging external resources, e.g., public facial images. This paper follows the second paradigm and implements the name-face association by matching faces appearing in Web videos with carefully collected Web facial images. Specially, given a Web video, we first identify the relevant and discriminative tags from its surrounding text. The tags are defined as Contextual Tags (CTags) as they roughly give the semantic context of the video (e.g., who are doing what at when and where). Then, facial images are retrieved by issuing a commercial search engine using the assembled text queries, where each query contains a detected name and one of the top CTags. By doing this, we crawl facial images that are highly relevant to the person in the video context, and thus the task of name-face association can be simply implemented by matching faces. Compared with traditional methods, our novelty lies in the exploration of both visual content of the video and crowdsourced text of the context that aims to find more specific facial images from the Web to facilitate the association. Experimental results on real-world Web videos containing faces and celebrity names show that the proposed method outperforms several existing methods in performance.

Full Text