Abstract
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making them highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.
Highlights
Coordinating cellular behavior in response to internal or external signals requires dynamic regulation at several levels [1,2]
We developed a new workflow to generate genome-scale transcriptional regulatory networks (TRNs), which integrates genome sequence information and gene expression data, as well as taking into consideration properties of bacterial transcription factors (TFs)
We further demonstrated the utility of this workflow by building a large-scale TRN model for R. sphaeroides
Summary
Coordinating cellular behavior in response to internal or external signals requires dynamic regulation at several levels [1,2]. Of the various levels at which cellular activities are regulated, transcriptional regulatory networks (TRNs) represent a active area for modeling, as high-throughput techniques to monitor RNA levels and protein-DNA interactions can be applied in a wide range of organisms [2,3]. Using such datasets, one can analyze, model, and reverse-engineer TRNs [3,4]. Identifying directly co-regulated genes (i.e., genes that are both co-expressed and share conserved upstream regulatory sequences) is challenging, as de novo identification of functional DNA binding motifs from co-expression clusters is hampered by the fact that the functional sequences of interest are often underrepresented [17]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.