Pronuclei and blastomeres are key structures in early embryonic development, and by localizing these structures simultaneously, the developmental state of the embryo can be assessed more comprehensively. However, there are several unavoidable problems in localization tasks due to the biological characteristics of embryos, including blastomeres overlap, pronuclei overlap, and similarity between pronuclei and background. In this study, we propose a novel localization network for pronuclei and blastomeres, which can solve these localization problems. Firstly, to address the issues related to the overlap of pronuclei and blastomeres, as well as the pronuclei and background similarity problem, we put the vision transformer with bi-level routing attention module (BiFormer) in the backbone. The BiFormer finds attention regions in the embryo image scene to obtain more edge and texture information of both pronuclei and blastomeres, which allows for a better feature realization of the neck region fusion interaction. Subsequently, to enhance model performance and mitigate redundant computation. The localization network uses partial convolution (PConv) in the backbone. The backbone network allows more efficient extraction of features by simultaneously reducing redundancy in computation with the effect of PConv. In addition, to mitigate the impact of low-quality samples in embryo images on localization, as well as to pay more attention to ordinary quality samples, we use wise intersection over union version 3 (WIoUv3) loss function with a dynamic non-monotonic focusing mechanism in the localization network, thus improving the overall performance of the algorithm. The experimental results show our model mAP@0.5 is 92.4% in localizing pronuclei and blastomeres. In practical terms, the ability to accurately localize pronuclei and blastomeres allows for better assessment of embryo quality and selects the best embryos.