ObjectiveSubmucosal infiltration of less than 200 μm is considered an indication for endoscopic surgery in cases of superficial esophageal cancer and precancerous lesions. This study aims to identify the risk factors associated with submucosal infiltration exceeding 200 micrometers in early esophageal cancer and precancerous lesions, as well as to establish and validate an accompanying predictive model.MethodsRisk factors were identified through least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression. Various machine learning (ML) classification models were tested to develop and evaluate the most effective predictive model, with Shapley Additive Explanations (SHAP) employed for model visualization.ResultsPredictive factors for early esophageal invasion into the submucosa included endoscopic ultrasonography or magnifying endoscopy> SM1(P<0.001,OR = 3.972,95%CI 2.161–7.478), esophageal wall thickening(P<0.001,OR = 12.924,95%CI,5.299–33.96), intake of pickled foods(P=0.04,OR = 1.837,95%CI,1.03–3.307), platelet-lymphocyte ratio(P<0.001,OR = 0.284,95%CI,0.137–0.556), tumor size(P<0.027,OR = 2.369,95%CI,1.128–5.267), the percentage of circumferential mucosal defect(P<0.001,OR = 5.286,95%CI,2.671–10.723), and preoperative pathological type(P<0.001,OR = 4.079,95%CI,2.254–7.476). The logistic regression model constructed from the identified risk factors was found to be the optimal model, demonstrating high efficacy with an area under the curve (AUC) of 0.922 in the training set, 0.899 in the validation set, and 0.850 in the test set.ConclusionA logistic regression model complemented by SHAP visualizations effectively identifies early esophageal cancer reaching 200 micrometers into the submucosa.