Abstract
Networks of regulatory relations between transcription factors (TF) and their target genes (TG)- implemented through TF binding sites (TFBS)- are key features of biology. An idealized approach to solving such networks consists of starting from a consensus TFBS or a position weight matrix (PWM) to generate a high accuracy list of candidate TGs for biological validation. Developing and evaluating such approaches remains a formidable challenge in regulatory bioinformatics. We perform a benchmark study on 34 Drosophila TFs to assess existing TFBS and cis-regulatory module (CRM) detection methods, with a strong focus on the use of multiple genomes. Particularly, for CRM-modelling we investigate the addition of orthologous sites to a known PWM to construct phyloPWMs and we assess the added value of phylogenentic footprinting to predict contextual motifs around known TFBSs. For CRM-prediction, we compare motif conservation with network-level conservation approaches across multiple genomes. Choosing the optimal training and scoring strategies strongly enhances the performance of TG prediction for more than half of the tested TFs. Finally, we analyse a 35th TF, namely Eyeless, and find a significant overlap between predicted TGs and candidate TGs identified by microarray expression studies. In summary we identify several ways to optimize TF-specific TG predictions, some of which can be applied to all TFs, and others that can be applied only to particular TFs. The ability to model known TF-TG relations, together with the use of multiple genomes, results in a significant step forward in solving the architecture of gene regulatory networks.
Highlights
The characterization and understanding of gene regulatory interaction networks that rigorously control the execution of genetic programs that make functional cells, tissues, and organisms is a key challenge for post-genome biology
Using the approach outlined above, we first asked if searching for homotypic clusters of transcription factors (TFs) DNA-binding sites (TFBS) is a generally applicable approach for detecting target genes (TG)
A position weight matrix (PWM) is built from all the TFBSs of a specific TF in the dataset, including the TFBSs present in the left-out region
Summary
The characterization and understanding of gene regulatory interaction networks that rigorously control the execution of genetic programs that make functional cells, tissues, and organisms is a key challenge for post-genome biology. Such regulatory interactions are formed by transcription factors (TFs) and their target genes (TGs) and are implemented via TF DNA-binding sites (TFBS) located in cis-regulatory modules (CRM) of TGs. A CRM is a promoter or enhancer sequence that contains TFBSs for one or more TFs and that controls a specific aspect of the expression pattern of the TG [1]. The vast majority of these interactions remain to be discovered This complexity means that it will be practically impossible to understand the logic and organization of gene regulatory networks without the application of genome-wide, TF-specific computational TG discovery methods. Experimental approaches would benefit greatly from being complemented by in silico TG discovery methods
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.