A machine learning–based multidimensional model integrating clinical, radiomics, and cell-free DNA methylation biomarkers for the classification of pulmonary nodules.

Wenhua Liang,Jinsheng Tao,Minhua Peng,Zhiwei Chen,Xixiang Tu,Zhujia Ye,Jianxing He,Bo Wang,Yang Yang,Jianbing Fan,Xiangcheng Qiu

doi:10.1200/jco.2023.41.16_suppl.3070

Abstract

3070 Background: Patients with pulmonary nodules undergoing excessive invasive procedures is a pressing clinical problem. We sought to develop a noninvasive, machine learning-based multidimensional tool combining clinical, radiomic, and cell-free DNA (cfDNA) methylation biomarkers for improving accuracy of pulmonary nodules classification. Methods: This prospectively collected and retrospective blinded evaluation trial enrolled a total of 1,276 subjects at 24 hospitals in China. All patients with a 5-30 mm pulmonary nodule at high risk of lung cancer had undergone surgical resection with definitive pathological diagnosis. Clinical information, preoperative peripheral blood, and chest CT scans were collected. The regions of interest (ROIs) containing target nodule on the CT images were automatically segmented by a deep-learning based model. 2,153 radiomics features were extracted from ROIs using PyRadiomics. Based on clinical and radiomics features, four classification models were constructed using LightGBM, Lasso, Random Forest, and Logistic Regression algorithms. Subsequently, the predicted probabilities of the above four models were averaged to obtain a final score of the combined clinical and radiomic biomarkers model (CRBM) in a training set (n=797). Then we integrated CRBM model with our previously established cfDNA methylation model (PulmoSeek; DOI: 10.1172/JCI145973) to create a new combined model using logistic regression (n=201), PulmoSeek Plus V2.0, and verified it independently (n =278). The ROC curves were compared to evaluate the diagnostic performance among the CRBM, PulmoSeek, and PulmoSeek Plus V2.0 model, pathologic diagnosis as the gold standard. Results: The CRBM model achieved AUCs of 0.81(95%CI 0.73-0.90) and 0.80 (0.74-0.86) in the two validation sets (n1=201, n2=278), respectively. In the training set (n=201) and validation set (n=278), the PulmoSeek Plus V2.0 obtained AUCs of 0.93 (0.90-0.97) and 0.91 (0.88-0.95), and accuracies of 0.89 (0.84-0.93) and 0.84 (0.79-0.88), respectively. In the combined set (n=479), when compared with CRBM and PulmoSeek, PulmoSeek Plus V2.0 yielded improved AUCs of 11% and 6%, and accuracies of 6% and 3%, respectively. PulmoSeek Plus V2.0 model for rule-out at the fixed specificity of 50%, had an overall sensitivity of 0.98 (0.96-0.99), PPV of 0.86 (0.82-0.89), and NPV of 0.998 (0.988-1.000, at 5% prevalence). It maintains good diagnostic performance in early-stage lung cancer (0-I, n=328) and 5-10 mm nodules (n=92), with sensitivities of 0.98 (0.96-0.99) and 0.98 (0.92-0.99), respectively. Conclusions: PulmoSeek Plus V2.0, as a novel machine learning-based multidimensional model, improves the accuracy of pulmonary nodules classification, and potentially reduces the unnecessary invasive procedures among individuals with benign nodules. Clinical trial information: NCT03181490 , NCT03651986 .

Full Text