Abstract
BackgroundAlthough the diagnostic method for coronary atherosclerosis heart disease (CAD) is constantly innovated, CAD in the early stage is still missed diagnosis for the absence of any symptoms. The gene expression levels varied during disease development; therefore, a classifier based on gene expression might contribute to CAD diagnosis. This study aimed to construct genetic classification models for CAD using gene expression data, which may provide new insight into the understanding of its pathogenesis.MethodsAll statistical analysis was completed by R 3.4.4 software. Three raw gene expression datasets (GSE12288, GSE7638 and GSE66360) related to CAD were downloaded from the Gene Expression Omnibus database and included for analysis. Limma package was performed to identify differentially expressed genes (DEGs) between CAD samples and healthy controls. The WGCNA package was conducted to recognize CAD-related gene modules and hub genes, followed by recursive feature elimination analysis to select the optimal features genes (OFGs). The genetic classification models were established using support vector machine (SVM), random forest (RF) and logistic regression (LR), respectively. Further validation and receiver operating characteristic (ROC) curve analysis were conducted to evaluate the classification performance.ResultsIn total, 374 DEGs, eight gene modules, 33 hub genes and 12 OFGs (HTR4, KISS1, CA12, CAMK2B, KLK2, DDC, CNGB1, DERL1, BCL6, LILRA2, HCK, MTF2) were identified. ROC curve analysis showed that the accuracy of SVM, RF and LR were 75.58%, 63.57% and 63.95% in validation; with area under the curve of 0.813 (95% confidence interval, 95% CI 0.761–0.866, P < 0.0001), 0.727 (95% CI 0.665–0.788, P < 0.0001) and 0.783 (95% CI 0.725–0.841, P < 0.0001), respectively.ConclusionsIn conclusion, this study found 12 gene signatures involved in the pathogenic mechanism of CAD. Among the CAD classifiers constructed by three machine learning methods, the SVM model has the best performance.
Highlights
The diagnostic method for coronary atherosclerosis heart disease (CAD) is constantly innovated, CAD in the early stage is still missed diagnosis for the absence of any symptoms
The module eigengenes (ME) of blue, green, yellow, brown, pink and red modules were positively correlated with CAD status (r > 0, P < 0.05), while MEs of turquoise and black modules were negatively correlated with CAD status (r < 0, P < 0.05)
Validation and evaluation of classifiers performance The results showed that support vector machine (SVM), random forest (RF) and logistic regression (LR) classifiers could accurately classify (94.59%), (95.50%) and 108 (97.30%) of the 111 samples in internal validation, respectively
Summary
The diagnostic method for coronary atherosclerosis heart disease (CAD) is constantly innovated, CAD in the early stage is still missed diagnosis for the absence of any symptoms. The gene expression levels varied during disease development; a classifier based on gene expression might contribute to CAD diagnosis. This study aimed to construct genetic classification models for CAD using gene expression data, which may provide new insight into the understanding of its pathogenesis. Peng et al BMC Cardiovascular Disorders (2022) 22:42 disease report (2018) estimated that about 290 million people are suffering from CVDs, and 11 million of them are CAD patients [3]. CAD in the early stage is still missed diagnosis for the absence of any symptoms or mild degree of disease [8].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.