Abstract

BackgroundConventional differential gene expression analysis by methods such as student’s t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them. Network-based approaches provide a natural way to study these interactions and to investigate the rewiring interactions in disease versus control groups. In this paper, we apply weighted graphical LASSO (wgLASSO) algorithm to integrate a data-driven network model with prior biological knowledge (i.e., protein-protein interactions) for biological network inference. We propose a novel differentially weighted graphical LASSO (dwgLASSO) algorithm that builds group-specific networks and perform network-based differential gene expression analysis to select biomarker candidates by considering their topological differences between the groups.ResultsThrough simulation, we showed that wgLASSO can achieve better performance in building biologically relevant networks than purely data-driven models (e.g., neighbor selection, graphical LASSO), even when only a moderate level of information is available as prior biological knowledge. We evaluated the performance of dwgLASSO for survival time prediction using two microarray breast cancer datasets previously reported by Bild et al. and van de Vijver et al. Compared with the top 10 significant genes selected by conventional differential gene expression analysis method, the top 10 significant genes selected by dwgLASSO in the dataset from Bild et al. led to a significantly improved survival time prediction in the independent dataset from van de Vijver et al. Among the 10 genes selected by dwgLASSO, UBE2S, SALL2, XBP1 and KIAA0922 have been confirmed by literature survey to be highly relevant in breast cancer biomarker discovery study. Additionally, we tested dwgLASSO on TCGA RNA-seq data acquired from patients with hepatocellular carcinoma (HCC) on tumors samples and their corresponding non-tumorous liver tissues. Improved sensitivity, specificity and area under curve (AUC) were observed when comparing dwgLASSO with conventional differential gene expression analysis method.ConclusionsThe proposed network-based differential gene expression analysis algorithm dwgLASSO can achieve better performance than conventional differential gene expression analysis methods by integrating information at both gene expression and network topology levels. The incorporation of prior biological knowledge can lead to the identification of biologically meaningful genes in cancer biomarker studies.

Highlights

  • Conventional differential gene expression analysis by methods such as student’s t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them

  • We propose a novel algorithm called differentially weighted graphical least absolute shrinkage and selectioin operator (LASSO) for network-based differential gene expression analysis

  • We show the application of dwgLASSO on two independent microarray datasets from breast cancer patients for survival time prediction, and on TCGA RNA-seq data acquired from patients with hepatocellular carcinoma (HCC) for classification task between tumor samples and their corresponding non-tumorous liver tissues

Read more

Summary

Introduction

Conventional differential gene expression analysis by methods such as student’s t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them. Independent studies for the same clinical types of patients often lead to different sets of significant genes and had only few in common [4] This may be attributed to the fact that genes are members of strongly intertwined biological pathways and are highly interactive with each other. As a result, when the number of genes is large, relevance network tends to generate over-complicated networks that contain overwhelming false positives Bayesian network is another classic data-driven network model [9]. Bayesian network cannot model cyclic structures, such as feedback loops, which are common in biological networks

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.