Abstract

BackgroundInference of active regulatory cascades under specific molecular and environmental perturbations is a recurring task in transcriptional data analysis. Commercial tools based on large, manually curated networks of causal relationships offering such functionality have been used in thousands of articles in the biomedical literature. The adoption and extension of such methods in the academic community has been hampered by the lack of freely available, efficient algorithms and an accompanying demonstration of their applicability using current public networks.ResultsIn this article, we propose a new statistical method that will infer likely upstream regulators based on observed patterns of up- and down-regulated transcripts. The method is suitable for use with public interaction networks with a mix of signed and unsigned causal edges. It subsumes and extends two previously published approaches and we provide a novel algorithmic method for efficient statistical inference. Notably, we demonstrate the feasibility of using the approach to generate biological insights given current public networks in the context of controlled in-vitro overexpression experiments, stem-cell differentiation data and animal disease models. We also provide an efficient implementation of our method in the R package QuaternaryProd available to download from Bioconductor.ConclusionsIn this work, we have closed an important gap in utilizing causal networks to analyze differentially expressed genes. Our proposed Quaternary test statistic incorporates all available evidence on the potential relevance of an upstream regulator. The new approach broadens the use of these types of statistics for highly curated signed networks in which ambiguities arise but also enables the use of networks with unsigned edges. We design and implement a novel computational method that can efficiently estimate p-values for upstream regulators in current biological settings. We demonstrate the ready applicability of the implemented method to analyze differentially expressed genes using the publicly available networks.

Highlights

  • Inference of active regulatory cascades under specific molecular and environmental perturbations is a recurring task in transcriptional data analysis

  • Results on simulated data In order to illustrate the performance of the three scoring statistics (QS, Correctness score (CS), Enrichment score (ES)) in networks with various degrees of ambiguity, we consider a hypothetical network consisting of 20,000 transcripts and 5,000 potential upstream regulators

  • Using a time course of stem cell differentiation to a pancreatic endocrine fate we previously showed that the CS statistic was able to identify Interleukin 6 (IL6) as a novel secreted factor involved in this process [5]

Read more

Summary

Introduction

Inference of active regulatory cascades under specific molecular and environmental perturbations is a recurring task in transcriptional data analysis. Throughout this paper, the word upstream is used to refer to regulators one step previous to a gene in a biological pathway Commercial products, such as Qiagen’s IPA application (http:// www.ingenuity.com/), are based on manually curated networks with a large number of signed causal relationships extracted from nearly 5 million findings [1]. At the time of writing, Qiagen’s webpage (www.ingenuity.com/ipa) lists more than 10,000 citations of biomedical articles making use of their commercial product on top of such a network. Such highly curated networks are not freely available to the academic community for further algorithmic development and generation of biomedical insights

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.