Objectives: To develop and validate predictive models for esophageal squamous cell carcinoma (ESCC) using circulating cell-free DNA (cfDNA) terminal motif analysis. The goal was to improve the non-invasive detection of early-stage ESCC and its precancerous lesions. Methods: Between August 2021 and November 2022, we prospectively collected plasma samples from 448 individuals at the Department of Endoscopy, Cancer Hospital, Chinese Academy of Medical Sciences for cfDNA extraction, library construction, and sequencing. We analyzed 201 cases of ESCC, 46 high-grade intraepithelial neoplasia (HGIN), 46 low-grade intraepithelial neoplasia (LGIN), 176 benign esophageal lesions, and 29 healthy controls. Participants, including ESCC patients and control subjects, were randomly assigned to a training set (n=284) and a validation set (n=122). The training cohort underwent z-score normalization of cfDNA terminal motif matrices and a selection of distinctive features differentiated ESCC cases from controls. The random forest classifier, Motif-1 (M1), was then developed through principal component analysis, ten-fold cross-validation, and recursive feature elimination. M1's efficacy was then validated in the validation and precancerous lesion sets. Subsequently, individuals with precancerous lesions were included in the dataset and participants were randomly allocated to newly formed training (n=243), validation (n=105), and test (n=150) cohorts. Using the same procedure as M1, we trained the Motif-2 (M2) random forest model with the training cohort. The M2 model's accuracy was then confirmed in the validation cohort to establish the optimal threshold and further tested by performing validation in the test cohort. Results: We developed two cfDNA terminal motif-based predictive models for ESCC and associated precancerous conditions. The first model, M1, achieved a sensitivity of 90.0%, a specificity of 77.4%, and an area under the curve (AUC) of 0.884 in the validation cohort. For LGIN, HGIN, and T1aN0 stage ESCC, M1's sensitivities were 76.1%, 80.4%, and 91.2% respectively. Notably, the sensitivity for jointly predicting HGIN and T1aN0 ESCC reached 85.0%. Both the predictive accuracy and sensitivity increased in line with the cancer's progression (P<0.001). The second model, M2, exhibited a sensitivity of 87.5%, a specificity of 77.4%, and an AUC of 0.857 in the test cohort. M2's sensitivities for detecting precancerous lesions and ESCC were 80.0% and 89.7%, respectively, and it showed a combined sensitivity of 89.4% for HGIN and T1aN0 stage ESCC. Conclusions: Two predictive models based on cfDNA terminal motif analysis for ESCC and its precancerous lesions are developed. They both show high sensitivity and specificity in identifying ESCC and its precancerous stages, indicating its potential for early ESCC detection.
Read full abstract