Abstract

BackgroundDifferential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. Such analysis often provides a wide list of genes that are differentially expressed between two or more groups. In general, identified differentially expressed genes (DEGs) can be subject to further downstream analysis for obtaining more biological insights such as determining enriched functional pathways or gene ontologies. Furthermore, DEGs are treated as candidate biomarkers and a small set of DEGs might be identified as biomarkers using either biological knowledge or data-driven approaches.MethodsIn this work, we present a novel approach for identifying biomarkers from a list of DEGs by re-ranking them according to the Minimum Redundancy Maximum Relevance (MRMR) criteria using repeated cross-validation feature selection procedure.ResultsUsing gene expression profiles for 199 children with sepsis and septic shock, we identify 108 DEGs and propose a 10-gene signature for reliably predicting pediatric sepsis mortality with an estimated Area Under ROC Curve (AUC) score of 0.89.ConclusionsMachine learning based refinement of DE analysis is a promising tool for prioritizing DEGs and discovering biomarkers from gene expression profiles. Moreover, our reported 10-gene signature for pediatric sepsis mortality may facilitate the development of reliable diagnosis and prognosis biomarkers for sepsis.

Highlights

  • Differential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest

  • Identification of differentially expressed genes and enriched pathways Based on absolute fold change ≥1.5 and adjusted p-value ≤0.05, 108 from a total of 10,596 genes were found to be DEGs between survival and non-survival septic pediatric patients (See Additional file 1: Table S1) and Additional file 2: Fig. S1)

  • We presented a datadriven approach to prioritize the marker genes using an instance of the Minimum Redundancy Maximum Relevance (MRMR) feature selection algorithm for selecting genes with the highest Area Under ROC Curve (AUC) for predicting the pediatric sepsis mortality and the minimal redundancy among selected genes in terms of Pearson’s correlation coefficients

Read more

Summary

Introduction

Differential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. Such analysis often provides a wide list of genes that are differentially expressed between two or more groups. Existing physiological scoring tools commonly used in intensive care units (ICUs), such as Acute Physiologic and Chronic Health Evaluation (APACHE) [7] and Sepsis-related Organ Failure Assessment (SOFA) [8], use clinical and laboratory measurements to quantify critical illness severity but provide little information about the risk for poor outcome (e.g., mortality) at the onset of the disease [2].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call