The problem of building a face recognition pipeline faces numerous challenges such as changes in lighting, pose, and facial expressions. The main stages of the pipeline include detection, alignment, feature extraction, and face representation. Each of these stages is critically important for achieving accurate recognition. The article analyzes and compares modern algorithms and models for face detection and recognition in terms of their ability to correctly identify true positives (TP) and true negatives (TN) while minimizing false negatives (FN) and false positives (FP) in facial recognition. Classical algorithms and lightweight models, such as MediaPipe, offer the highest speeds but sacrifice some accuracy. Conversely, heavier models like RetinaFace deliver greater accuracy at the expense of speed. For systems prioritizing maximum detection accuracy and minimizing missed faces, models like DSFD or RetinaFace-Resnet50 are recommended, despite their slow performance and unsuitability for real-time detection. If the primary goal is maximum detection speed and occasional missed faces in uncontrolled conditions are acceptable, an SSD face recognition solution is preferable. For applications requiring a balanced approach to speed and accuracy, the RetinaFace-MobilenetV1 model is optimal in terms of real-time detection speed and satisfactory accuracy. The ArcFace model demonstrates superior performance with a TP rate of 0.92 and a TN rate of 0.91, indicating a high accuracy in both identifying the correct person and rejecting mismatched images. ArcFace also maintains a low FP rate of 0.09. FaceNet follows with a TP rate of 0.89 and an impressive TN rate of 0.94, showcasing its proficiency in avoiding incorrect matches. In contrast, VGGFace, DeepFace, and OpenFace show moderate TP rates between 0.61 and 0.78, coupled with higher FN and FP rates. The DeepID model exhibits the lowest performance, with a TP rate of 0.47 and a TN rate of 0.60, reflecting substantial difficulties in accurate identification. The conclusions emphasize the importance of selecting models based on accuracy, speed, and resource requirements, suggesting RetinaFace and ArcFace/FaceNet as good trade-off options.
Read full abstract