Convolutional neural networks have achieved excellent performance on face recognition (FR) by learning the high discriminative features with advanced loss functions. These improved loss functions share the similar idea for maximizing inter-class variance or minimizing intra-class variance. In this article, from a different perspective, we consider enlarging the inter-class variance by directly penalizing weight vectors of last fully connected layer, which represent the center of classes. To the end, we propose Orthogonality loss as an elegant penalty item appends to common classification loss to learn the discriminative representations. The main idea is that in order for weight vectors to be discriminative, it should be as close as possible to be orthogonal to each other in the vector space. More specifically, the optimization objective of Orthogonality loss is the first moment and second moment of cosine similarity of weight vectors. We performed the empirical studies through simulating the long-tail datasets to show the generalization ability of the proposed approach on long-tail distribution datasets. Further, extensive experiments on large-scale face recognition benchmarks including the Labeled Face in the Wild (LFW), the IARPA Janus Benchmark A (IJB-A), IJB-B, IJB-C, MegaFace Challenge 1 (MF1) and MS-Celeb-1M Low-shot Learning demonstrated that Orthogonality loss outperforms strong baselines, which showcases the extensive suitability and effectiveness of Orthogonality loss.
Read full abstract