Face verification and identification traditionally follow a symmetric matching approach, where the same model (e.g., ResNet-50 vs. ResNet-50) generates embeddings for both gallery and query images, ensuring compatibility. However, real-world scenarios often demand asymmetric matching, especially when query devices have limited computational resources or employ heterogeneous models (e.g., ResNet-50 vs. SwinTransformer). This asymmetry can degrade face recognition performance due to incompatibility between embeddings from different models. To tackle this asymmetric face recognition problem, we introduce the Learnable Anchor Embedding (LAE) model, which features two key innovations: the Shared Learnable Anchor and a Light Cross-Attention Mechanism. The Shared Learnable Anchor is a dynamic attractor, aligning heterogeneous gallery and query embeddings within a unified embedding space. The Light Cross-Attention Mechanism complements this alignment process by reweighting embeddings relative to the anchor, efficiently refining their alignment within the unified space. Extensive evaluations of several facial benchmark datasets demonstrate LAE’s superior performance, particularly in asymmetric settings. Its robustness and scalability make it an effective solution for real-world applications such as edge-device authentication, cross-platform verification, and environments with resource constraints.
Read full abstract