Identifying the set of genes collectively responsible for causing a disease from differential gene expression data is called gene selection problem. Though many complex methodologies have been applied to solve gene selection, formulated as an optimization problem, this study introduces a new simple, efficient, and biologically plausible solution procedure where the collective power of the targeted gene set to discriminate between diseased and normal gene expression profiles was focused. It uses Simulated Annealing to solve the underlying optimization problem and termed here as Differential Gene Expression Based Simulated Annealing (DGESA). The Ranked Variance (RV) method has been applied to prioritize genes to form reference set to compare with the outcome of DGESA. In a case study on Eosinophilic Esophagitis (EoE) and other gastrointestinal diseases, RV identified the top 40 high-variance genes, overlapping with disease-causing genes from DGESA. DGESA identified 40 gene pathways each for EoE, Crohn's Disease (CD), and Ulcerative Colitis (UC), with 10 genes for EoE, 8 for CD, and 7 for UC confirmed in literature. For EoE, confirmed genes include KRT79, CRISP2, IL36G, SPRR2B, SPRR2D, and SPRR2E. For CD, validated genes are NPDC1, SLC2A4RG, LGALS8, CDKN1A, XAF1, and CYBA. For UC, confirmed genes include TRAF3, BAG6, CCDC80, CDC42SE2, and HSPA9. RV and DGESA effectively elucidate molecular signatures in gastrointestinal diseases. Validating genes like SPRR2B, SPRR2D, SPRR2E, and STAT6 for EoE demonstrates DGESA's efficacy, highlighting potential targets for future research.
Read full abstract