Background: Accurate segmentation of tumor targets is critical for maximizing tumor control and minimizing normal tissue toxicity. We proposed a sequential and iterative U-Net (SI-Net) deep learning method to auto-segment the high-risk primary tumor clinical target volume (CTVp1) for treatment planning of nasopharyngeal carcinoma (NPC) radiotherapy.Methods: The SI-Net is a variant of the U-Net architecture. The input of SI-Net includes one CT image, the CTVp1 contour on this image, and the next CT image. The output is the predicted CTVp1 contour on the next CT image. We designed the SI-Net, using the left side to learn the volumetric features and the right to localize the contour on the next image. Two prediction directions, one from inferior to superior (forward direction) and the other from superior to inferior (backward direction), were tested. The performance was compared between the SI-Net and the U-Net using Dice similarity coefficient (DSC), Jaccard index (JI), average surface distance (ASD), and Hausdorff distance (HD) metrics.Results: The DSC and JI values from the forward direction SI-Net model were 5 and 6% higher than those from the U-Net model (0.84 ± 0.04 vs. 0.80 ± 0.05 and 0.74 ± 0.05 vs. 0.69 ± 0.05, p < 0.001). The smaller ASD and HD values also indicated a better performance (2.8 ± 1.0 vs. 3.3 ± 1.0 mm and 8.7 ± 2.5 vs. 9.7 ± 2.7 mm, p < 0.01) for the SI-Net model. For the backward direction SI-Net model, the DSC and JI values were still better than those from the U-Net model (p < 0.01), although there were no significant differences in ASD and HD.Conclusions: The SI-Net model preserved the continuity between adjacent images and thus improved the segmentation accuracy compared with the conventional U-Net model. This model has potential of improving the efficiency and consistence of CTVp1 contouring for NPC patients.