Face recognition, face measurement, camera control, measurement of expressions and other tasks can benefit from online visual multi-face tracking. Given the availability of high quality general purpose detectors and tracking-by-detection frameworks, we provide guidance on how to develop a multi-face tracker out of standard components. In this paper, we train common object detectors specifically on faces to understand how well these detectors perform and evaluate different classifier loss functions. Our specific case study tracks faces in the context of council meetings and in parliamentary settings such as the Canadian House of Commons for which we create an annotated video set as a benchmark (see Fig. 1). These meetings in a parliamentary setting are often recorded from multiple cameras with participants and audiences walking around. Fast camera switching and zooming lead to significant scale changes of faces. Therefore, these settings can be characterized as tracking in unconstrained video. This will negatively impact the tracking accuracy and increase the likelihood of identity switches (IDS) between face labels. However, being able to track in unconstrained video enables a wider range of measurement applications. We find that while online tracking based on combining state-of-the-art methods can lead to high-quality tracking results, there is still a large gap between offline and online methods. The discussed method can be adapted to other tracking tasks for which large image databases are available.
Read full abstract