To identify neutrophil extracellular trap (NET)-associated gene features in the blood of patients with myocardial infarction (MI) using bioinformatics and machine learning, with the aim of exploring potential diagnostic utility in atherosclerosis. The datasets GSE66360 and GSE48060 were downloaded from the Gene Expression Omnibus (GEO) public database. GSE66360 was used as the training set, and GSE48060 was used as an independent validation set. Differential genes related to NETs were screened using R software. Machine learning was performed based on the differential expression of NET-related genes across different samples. The advantages and disadvantages of 4 machine learning algorithms (Random Forest [RF], Extreme Gradient Boosting [XGBoost, XGB], Generalized Linear Models [GLM], and Support Vector Machine-Recursive Feature Elimination [SVM-RFE]) were compared, and the optimal method was used to screen feature genes and construct diagnostic models, which were then validated in the external validation dataset. Correlations between feature genes and immune cells were analyzed, and samples were reclustered based on the expression of feature genes. Differences in downstream molecular mechanisms and immune responses were explored for different clusters. Weighted Gene Co-expression Network Analysis was performed on different clusters, and disease-related NET genes were extracted, followed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analysis. Finally, Mendelian randomization was employed to further investigate the causal relationship between the expression of model genes and the occurrence of MI. Forty-seven NET-related differential genes were obtained, and after comparing the 4 machine learning methods, support vector machine was used to screen ATG7, MMP9, interleukin 6 (IL6), DNASE1, and PDE4B as key genes for the construction of diagnostic models. The diagnostic value of the model was validated in an independent external validation dataset. These five genes showed strong correlations with neutrophils. Different sample clusters also demonstrated differential enrichment in pathways such as nitrogen metabolism, complement and coagulation cascades, cytokine-cytokine receptor interaction, renin-angiotensin system, and steroid biosynthesis. The Mendelian randomization results demonstrate a causal relationship between the expression of ATG7 and the incidence of myocardial infarction. The feature genes ATG7, MMP9, IL6, DNASE1, and PDE4B, identified using bioinformatics, may serve as potential diagnostic biomarkers and therapeutic targets for Myocardial infarction. Specifically, the expression of ATG7 could potentially be a significant factor in the occurrence of MI.
Read full abstract