Abstract
The aim of this project was to identify candidate novel therapeutic targets to facilitate the treatment of COPD using machine-based learning (ML) algorithms and penalized regression models. In this study, 59 healthy smokers, 53 healthy non-smokers and 21 COPD smokers (9 GOLD stage I and 12 GOLD stage II) were included (n = 133). 20,097 probes were generated from a small airway epithelium (SAE) microarray dataset obtained from these subjects previously. Subsequently, the association between gene expression levels and smoking and COPD, respectively, was assessed using: AdaBoost Classification Trees, Decision Tree, Gradient Boosting Machines, Naive Bayes, Neural Network, Random Forest, Support Vector Machine and adaptive LASSO, Elastic-Net, and Ridge logistic regression analyses. Using this methodology, we identified 44 candidate genes, 27 of these genes had been previously been reported as important factors in the pathogenesis of COPD or regulation of lung function. Here, we also identified 17 genes, which have not been previously identified to be associated with the pathogenesis of COPD or the regulation of lung function. The most significantly regulated of these genes included: PRKAR2B, GAD1, LINC00930 and SLITRK6. These novel genes may provide the basis for the future development of novel therapeutics in COPD and its associated morbidities.
Highlights
From a genetic perspective, Chronic obstructive pulmonary disease (COPD) is a complex disease arising from mutations in multiple alleles and the lack of integration of data in this disease has been attributed to dispersed, independent genome-wide association studies (GWAS)[7]
54,675 probes were screened using the microarray dataset generated from small airway epithelium (SAE) cells previously from: 59 healthy smokers, 53 healthy non-smokers and 21 COPD smokers (42.8% of GOLD stage I and 57.2% of GOLD stage II) (Table 1)[14]
Chronic obstructive pulmonary disease (COPD) is a progressive inflammatory disease characterized by airway obstruction and is predicted to be among the first three causes of death worldwide[1,2]
Summary
COPD is a complex disease arising from mutations in multiple alleles and the lack of integration of data in this disease has been attributed to dispersed, independent genome-wide association studies (GWAS)[7]. In this study, we employed a novel methodology, namely machine-based learning algorithms combined with penalized regression models, in order to study genomic change in COPD in a more selective manner. This study was designed to apply signaling-network methodology with machine-based learning methods to better understand the genetic etiology of smoking exposure and COPD in 59 healthy smokers, 53 healthy non-smokers and 21 COPD smokers (9 of GOLD stage I and 12 of GOLD stage II) were included (Total: n = 133). We identified 17 novel genes, which were not previously associated with COPD, the regulation of lung function or smoking exposure. The most significantly regulated of these genes included: PRKAR2B, GAD1, LINC00930, and SLITRK6 These novel genes may provide the basis for the future development of novel therapeutics in COPD and warrant further investigation and validation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.