Face recognition is a long-lasting hot topic in compute vision. The face recognition system mainly includes face detection, alignment and feature extraction. In the forward task, the extracted features are used to measure the similarity between faces, and outputs whether those are same person or not or which person it is in the registered set. Typically, the three stages of recognition system training independently of each other have the following shortcomings: 1) redundant calculation of feature maps; 2) unable to end-to-end optimization; 3) detecting an extracting so much useless face. A lightweight model for saliency face detection and recognition that can be optimized end-to-end is proposed. While maintaining accuracy, it meets the real-time and memory limitation requirements in embedded devices or terminals.