Abstract
BackgroundChronic obstructive pulmonary disease (COPD) is a major public health problem and cause of mortality worldwide. However, COPD in the early stage is usually not recognized and diagnosed. It is necessary to establish a risk model to predict COPD development.MethodsA total of 441 COPD patients and 192 control subjects were recruited, and 101 single-nucleotide polymorphisms (SNPs) were determined using the MassArray assay. With 5 clinical features as well as SNPs, 6 predictive models were established and evaluated in the training set and test set by the confusion matrix AU-ROC, AU-PRC, sensitivity (recall), specificity, accuracy, F1 score, MCC, PPV (precision) and NPV. The selected features were ranked.ResultsNine SNPs were significantly associated with COPD. Among them, 6 SNPs (rs1007052, OR = 1.671, P = 0.010; rs2910164, OR = 1.416, P < 0.037; rs473892, OR = 1.473, P < 0.044; rs161976, OR = 1.594, P < 0.044; rs159497, OR = 1.445, P < 0.045; and rs9296092, OR = 1.832, P < 0.045) were risk factors for COPD, while 3 SNPs (rs8192288, OR = 0.593, P < 0.015; rs20541, OR = 0.669, P < 0.018; and rs12922394, OR = 0.651, P < 0.022) were protective factors for COPD development. In the training set, KNN, LR, SVM, DT and XGboost obtained AU-ROC values above 0.82 and AU-PRC values above 0.92. Among these models, XGboost obtained the highest AU-ROC (0.94), AU-PRC (0.97), accuracy (0.91), precision (0.95), F1 score (0.94), MCC (0.77) and specificity (0.85), while MLP obtained the highest sensitivity (recall) (0.99) and NPV (0.87). In the validation set, KNN, LR and XGboost obtained AU-ROC and AU-PRC values above 0.80 and 0.85, respectively. KNN had the highest precision (0.82), both KNN and LR obtained the same highest accuracy (0.81), and KNN and LR had the same highest F1 score (0.86). Both DT and MLP obtained sensitivity (recall) and NPV values above 0.94 and 0.84, respectively. In the feature importance analyses, we identified that AQCI, age, and BMI had the greatest impact on the predictive abilities of the models, while SNPs, sex and smoking were less important.ConclusionsThe KNN, LR and XGboost models showed excellent overall predictive power, and the use of machine learning tools combining both clinical and SNP features was suitable for predicting the risk of COPD development.
Highlights
Chronic obstructive pulmonary disease (COPD) is a major public health problem and cause of mortality worldwide
The k-nearest neighbors classifier (KNN), logistic regression (LR) and XGboost models showed excellent overall predictive power, and the use of machine learning tools combining both clinical and single nucleotide polymorphism (SNP) features was suitable for predicting the risk of COPD development
The results indicated that COPD patients were more likely to be older, male, and smokers, and the Forced expiratory volume in one second (FEV1)/forced vital capacity (FVC) (%) and FEV1 (%) values were lower in the COPD group than in the healthy controls
Summary
Chronic obstructive pulmonary disease (COPD) is a major public health problem and cause of mortality worldwide. It has been reported that chronic obstructive pulmonary disease (COPD) is a public health challenge due to its high prevalence and related disability, mortality and socioeconomic burden worldwide [1,2,3]. 90% of deaths related to COPD occur in Asia and Africa [4]. In 2013, more than 0.9 million deaths related to COPD occurred, and COPD was reported to be the third leading cause of death in China [5]. The typical symptoms of COPD include dyspnea, chronic cough, and sputum production, and spirometry is considered the gold-standard method for the diagnosis of COPD [6]. Spirometry is essential for diagnosis and provides a useful description of the severity of pathologic changes in COPD. COPD is clinically defined as a post-bronchodilator FEV1/FVC less than 70% of the predicted value and FEV1 less than 80% of the predicted value [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.