Biomarkers play a crucial role across various fields by providing insights into biological responses to interventions. High-throughput gene expression profiling technologies facilitate the discovery of data-driven biomarkers through extensive datasets. This study focuses on identifying biomarkers in gene expression data related to chemical injuries by mustard gas, covering a spectrum from healthy individuals to severe injuries. The study utilized RNA-Seq data comprising 52 expression data samples for 54,583 gene transcripts. These samples were categorized into four classes based on the GOLD classification for chemically injured individuals: Severe (n=14), Moderate (n=11), Mild (n=16), and healthy controls (n=11). Data preparation involved examining an Excel file created in the R programming environment using MLSeq and devtools packages. Feature selection was performed using Genetic Algorithm and Simulated Annealing, with Random Forest algorithm employed for classification. Ab initio methods ensured computational efficiency and result accuracy, while molecular dynamics simulation acted as a virtual experiment bridging the gap between experimental and theoretical experiences. A total of 12 models were created, each introducing a list of differentially expressed genes as potential biomarkers. The performance of models varied across group comparisons, with the Genetic Algorithm generally outperforming Simulated Annealing in most cases. For the Severe vs. Moderate group, GA achieved the best performance with an accuracy of 94.38%, recall of 91.64%, and specificity of 97.10%. The results highlight the effectiveness of GA in most group comparisons, while SA performed better in specific cases involving Moderate and Mild groups. These biomarkers were evaluated against the gene expression data to assess their expression changes between different groups of chemically injured individuals. Four genes were selected based on level expression for further investigation: CXCR1, EIF2B2, RAD51, and RXFP2. The expression levels of these genes were analyzed to determine their differential expression between the groups. This study was designed as a computational effort to identify diagnostic biomarkers in basic biological system research. Our findings proposed a list of discriminative biomarkers capable of distinguishing between different groups of chemically injured individuals. The identification of key genes highlights the potential for biomarkers to serve as indicators of chemical injury severity, warranting further investigation to validate their clinical relevance and utility in diagnosis and treatment.
Read full abstract