Abstract

We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif–function associations. This framework is applied to the recently sequenced genome of the jewel wasp, Nasonia vitripennis, making use of the existing knowledge of motifs and gene annotations in another insect genome, that of the fruitfly. The framework uses cross-species comparison to improve the specificity of its predictions, and does so without relying upon non-coding sequence alignment. It is therefore well suited for comparative genomics across large evolutionary divergences, where existing alignment-based methods are not applicable. We also apply the framework to find motifs associated with socially regulated gene sets in the honeybee, Apis mellifera, using comparisons with Nasonia, a solitary species, to identify honeybee-specific associations.

Highlights

  • Computational discovery and analysis of gene regulatory networks begins with the characterization of transcription factor (TF) motifs, through experimental or computational means

  • We examine this issue systematically, while proposing a new statistical score for motif scanning, and find different methods to be most efficacious for predicting the motif module for different TFs

  • We develop a computational pipeline for predicting the functions of transcription factor motifs, through DNA sequence analysis

Read more

Summary

Introduction

Computational discovery and analysis of gene regulatory networks begins with the characterization of transcription factor (TF) motifs, through experimental or computational means. The predicted binding sites may be used to annotate a set of genes (typically genes that are proximal to the sites) as being putative regulatory targets of the motif. In a later section (‘‘Motif scanning methods’’), we briefly review existing approaches to this problem, most of which are based on finding sites whose quality of match to the motif exceeds a threshold, or locations where clusters of abovethreshold matches are found. Each of these approaches has its merits and problems, and it is not clear which method ought to be used in practice. We examine this issue systematically, while proposing a new statistical score for motif scanning, and find different methods to be most efficacious for predicting the motif module for different TFs

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call