Objective: Based on the diagnostic model established and validated by the machine learning algorithm, to investigate the value of seven tumor-associated autoantibodies (TAABs), namely anti-p53, PGP9.5, SOX2, GAGE7, GBU4-5, MAGEA1 and CAGE antibodies in the diagnosis of non-small cell lung cancer (NSCLC) and to differentiate between NSCLC and benign lung nodules. Methods: This was a retrospective study of clinical cases. Model building queue: a total of 227 primary patients who underwent radical lung cancer surgery in the Department of Thoracic Surgery, Shengjing Hospital of China Medical University, from November 2018 to June 2021 were collected as the NSCLC group, and 120 cases of benign lung nodules, 122 cases of pneumonia and 120 healthy individuals were selected as the control groups. External validation queue: a total of 100 primary patients who underwent radical lung cancer surgery in the Department of Thoracic Surgery, Shengjing Hospital of China Medical University, from May 2022 to December 2022 were collected as the NSCLC group, and 36 cases of benign lung nodules, 32 cases of pneumonia and 44 healthy individuals were selected as the control groups. In addition, NSCLC was divided into early (stage 0-ⅠB) and mid-to-late (stage ⅡA-ⅢB) subgroups. The levels of 7-TAABs were detected by enzyme immunoassay, and serum concentrations of CEA and CYFRA21-1 were detected by electrochemiluminescence. Four machine learning algorithms, XGBoost, Lasso logistic regression, Naïve Bayes, and Support Vector Machine are used to establish classification models. And the best performance model was chosen based on evaluation metrics and a multi-indicator combination model was established. In addition, an online risk evaluation tool was generated to assist clinical applications. Results: Except for p53, the levels of rest six TAABs, CEA and CYFRA21-1 were significantly higher in the NSCLC group (P<0.05). Serum levels of anti-SOX2 [1.50 (0.60, 10.85) U/ml vs. 0.8 (0.20, 2.10) U/ml, Z=2.630, P<0.05] and MAGEA1 antibodies [0.20 (0.10, 0.43) U/ml vs. 0.10 (0.10, 0.20) U/ml, Z=2.289, P<0.05], CEA [3.13 (2.12, 5.64) ng/ml vs. 2.11 (1.25, 3.09) ng/ml, Z=3.970, P<0.05] and CYFRA21-1 [4.31(2.37, 7.14) ng/ml vs. 2.53(1.92, 3.48) ng/ml, Z=3.959, P<0.05] were significantly higher in patients with mid-to late-stage NSCLC than in early stages. XGBoost model was used to establish a multi-indicator combined detection model (after removing p53). 6-TAABs combined with CYFRA21-1 was the best combination model for the diagnosis of NSCLC and early NSCLC. The optimal diagnostic thresholds were 0.410, 0.701 and 0.744, and the AUC was 0.828, 0.757 and 0.741, respectively (NSCLC vs. control, NSCLC vs. benign lung nodules, early NSCLC vs. benign lung nodules) in model building queue, and the AUC was 0.760, 0.710 and 0.660, respectively (NSCLC vs. control, NSCLC vs. benign lung nodules, early NSCLC vs. benign lung nodules) in external validation queue. Conclusion: In the diagnosis of NSCLC, 6-TAABs is superior to that of traditional tumor markers CEA and CYFRA21-1, and can compensate for the shortcomings of traditional tumor markers. For the differential diagnosis of NSCLC and benign lung nodule, "6-TAABs+CYFRA21-1" is the most cost-effective combination, and plays an important role in prevention and screening for early lung cancer.
Read full abstract