Abstract

This study aimed to introduce novel techniques for identifying the genes associated with developing chronic obstructive pulmonary disease (COPD) and to prioritize COPD candidate genes using regression methods. This is a secondary analysis of the data from an experimental study. We used penalized logistic regressions with three different types of penalties included least absolute shrinkage and selection operator (LASSO), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD). The models were trained using genome-wide expression profiling to define gene networks relevant to the COPD stages. A 10-fold cross-validation scheme was used to evaluate the performance of the methods. In addition, we validate our results by the external validity approach. We reported the sensitivity, specificity, and area under curve (AUC) of the models. There were 21, 22, and 18 significantly associated genes for LASSO, SCAD, and MCP models, respectively. The most statistically conservative method (detecting less significant features) was MCP detected 18 genes that were all detected by the other two approaches. The most appropriate approach was a SCAD penalized logistic regression (AUC= 96.26, sensitivity= 94.2, specificity= 86.96). In this study, we have a common panel of 18 genes in all three models that show a significant positive and negative correlation with COPD, in which RNF130, STX6, PLCB1, CACNA1G, LARP4B, LOC100507634, SLC38A2, and STIM2 showed the odds ratio (OR) more than 1. However, there was a slight difference between penalized methods. Regularization solves the serious dimensionality problem in using this kind of regression. More exploration of how these genes affect the outcome and mechanism is possible more quickly in this manner. The regression-based approaches we present could apply to overcoming this issue.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call