Deep learning methods have demonstrated great potential for processing high-resolution images. The U-Net model, in particular, has shown proficiency in the segmentation of biomedical images. However, limited research has examined the application of deep learning to esophageal squamous cell carcinoma (ESCC) segmentation. Therefore, this study aimed to develop deep learning segmentation systems specifically for ESCC. A Visual Geometry Group (VGG)-based U-Net neural network architecture was utilized to develop the segmentation models. A pathological image cohort of surgical specimens was used for model training and internal validation, with two additional endoscopic biopsy section cohort for external validation. Model efficacy was evaluated across several metrics including Intersection over Union (IOU), accuracy, positive predict value (PPV), true positive rate (TPR), specificity, dice similarity coefficient (DSC), area under the receiver operating characteristic curve (AUC), and F1-Score. Surgical samples from ten patients were analyzed retrospectively, with each biopsy section cohort encompassing five patients. Transfer learning models based on U-Net weights yielded optimal results. For mucosa segmentation, the in internal validation achieved 93.81% IOU, with other parameters exceeding 96% (96.96% accuracy, 96.45% PPV, 96.65% TPR, 98.41% specificity, 96.81% DSC, 96.11% AUC, and 96.55% F1-Score). The tumor segmentation model attained an IOU of 91.95%, along with other parameters surpassing 95% (95.90% accuracy, 95.62% PPV, 95.71% TPR, 97.88% specificity, 95.81% DSC, 94.92% AUC, and 95.67% F1-Score). In the external validation for tumor segmentation model, IOU was 59.86% for validation database 1 (72.74% for accuracy, 76.03% for PPV, 77.17% for TPR, 83.80% for specificity, 74.89% for DSC, 71.83% for AUC, and 76.60% for F1-Score), and 50.88% for validation cohort 2 (68.03% for accuracy, 59.02% for PPV, 66.87% for TPR, 78.48% for specificity, 67.44% for DSC, 64.68% for AUC, and 62.70% for F1-Score). The models exhibited satisfactory results, paving the way for their potential deployment on standard computers and integration with other artificial intelligence models in clinical practice in the future. However, limited to the size of study, the generalizability of models is impaired in the external validation, larger pathological section cohort would be needed in future development to ensure robustness and generalization.