BackgroundTo construct a diagnostic model for Bipolar Disorder (BD) depressive phase using peripheral tissue RNA data from patients and combining Random Forest with Feedforward Neural Network methods. MethodsDatasets GSE23848, GSE39653, and GSE69486 were selected, and differential gene expression analysis was conducted using the limma package in R. Key genes from the differentially expressed genes were identified using the Random Forest method. These key genes' expression levels in each sample were used to train a Feedforward Neural Network model. Techniques like L1 regularization, early stopping, and dropout layers were employed to prevent model overfitting. Model performance was then validated, followed by GO, KEGG, and protein-protein interaction network analyses. ResultsThe final model was a Feedforward Neural Network with two hidden layers and two dropout layers, comprising 2345 trainable parameters. Model performance on the validation set, assessed through 1000 bootstrap resampling iterations, demonstrated a specificity of 0.769 (95 % CI 0.571–1.000), sensitivity of 0.818 (95 % CI 0.533–1.000), AUC value of 0.832 (95 % CI 0.642–0.979), and accuracy of 0.792 (95 % CI 0.625–0.958). Enrichment analysis of key genes indicated no significant enrichment in any known pathways. ConclusionKey genes with biological significance were identified based on the decrease in Gini coefficient within the Random Forest model. The combined use of Random Forest and Feedforward Neural Network to establish a diagnostic model showed good classification performance in Bipolar Disorder.
Read full abstract