There is still a huge gap in the accuracy of face recognition in public video surveillance scenarios. The far-sighted low-resolution (LR) frontal faces have holistic facial profiles but lack sufficient clearness, while the near-sighted high-resolution (HR) tilted faces show rich facial details yet incomplete facial structure suffering from the overhead self-occlusion of the head blocking the face. Following this observation, this paper proposes a dual-branch HR frontal face reconstruction network to explicitly exploit such coupled complementarity hidden in the far-near face images of the same subject, where one branch performs super-resolution (SR) of the LR frontal face and the other branch performs detail fusion and holistic compensation between multiple HR tilted faces as well as the super-resolved frontal result. In particular, we propose a secondary relevance attention mechanism to enhance the embedding of key features, which sequentially performs rough and precise feature matching and embedding, thus enabling coarse-to-fine progressive compensation. Further, scale-entangled densely connected blocks (SEDCB) are used to gradually integrate the relevance information at different scales (due to the different sighting distances) to promote the information interaction between the features of tilted faces. Besides, we also propose a ternary coupled sample pair (LR far-sighted frontal face, HR near-sighted tilted face, normal ground truth clear face) training scheme to supervise the network optimization. Extensive experimental results on two real-world tilt-view face datasets show that our method can not only reconstruct more realistic HR frontal faces but also facilitate the down-stream face identification task compared with the competing counterparts.
Read full abstract