Abstract

While face recognition has been intensively studied in the literature, there are only a few attempts on visual recognition of multiple faces simultaneously in videos, which has potential applications in practical video surveillance. In this paper, we address the problem of visual recognition and tracking of multiple faces in real-world videos involving large pose variation and occlusion. Instead of recognizing individual face independently, we introduce the constraints of inter-frame temporal smoothness and within-frame identity exclusivity on multiple faces in videos, and model the tasks of multiple face recognition (MFR) and multiple face tracking (MFT) jointly in an alternative optimization framework. We show this joint formulation for two different tasks leads to significantly improved MFR accuracy. Specifically, as appearance matching for face instances over consecutive frames plays a critical role in MFT, we propose an identity-specific metric learning method with a part-based object representation to learn a localized transformation for each face subject in an online manner, under which face instances of the same subject over consecutive frames are pulled as close as possible, while those of different subjects are pushed far away. Empirically, we evaluate our method on several MFR sequences against baselines, and the results demonstrate that our method can achieve improved accuracy performance in various challenging recognition scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call