Abstract

BackgroundIdentifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation. Computational approaches provide a useful complement to empirical methods for CRM discovery, but it is critical that we develop effective means to evaluate their performance in terms of estimating their sensitivity and specificity.ResultsWe introduce here pCRMeval, a pipeline for in silico evaluation of any enhancer prediction tools that are flexible enough to be applied to the Drosophila melanogaster genome. pCRMeval compares the result of predictions with the extensive existing knowledge of experimentally-validated Drosophila CRMs in order to estimate the precision and relative sensitivity of the prediction method. In the case of supervised prediction methods—when training data composed of validated CRMs are used—pCRMeval can also assess the sensitivity of specific training sets. We demonstrate the utility of pCRMeval through evaluation of our SCRMshaw CRM prediction method and training data. By measuring the impact of different parameters on SCRMshaw performance, as assessed by pCRMeval, we develop a more robust version of SCRMshaw, SCRMshaw_HD, that improves the number of predictions while maintaining sensitivity and specificity. Our analysis also demonstrates that SCRMshaw_HD, when applied to increasingly less well-assembled genomes, maintains its strong predictive power with only a minor drop-off in performance.ConclusionOur pCRMeval pipeline provides a general framework for evaluation that can be applied to any CRM prediction method, particularly a supervised method. While we make use of it here primarily to test and improve a particular method for CRM prediction, SCRMshaw, pCRMeval should provide a valuable platform to the research community not only for evaluating individual methods, but also for comparing between competing methods.

Highlights

  • Identifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation

  • Our pipeline provides a general framework that can be used to evaluate any CRM prediction tool flexible enough to be applied to the Drosophila melanogaster genome

  • The pCRMeval evaluation pipeline We developed a comprehensive pipeline, pCRMeval, for in silico evaluation of CRM prediction methods

Read more

Summary

Introduction

Identifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation. Downstream, and within introns of their associated genes, and often at a considerable genomic distance, CRM sequences serve as scaffolds for the binding of transcription factors and chromatin modifying enzymes. Their identification is critical for understanding the spatial and temporal regulation of metazoan gene expression. Computational CRM discovery has several advantages, including low cost, rapid results, and no requirement for access to cell lines, antibodies, tissue samples, and other expensive and/or limiting biological resources and assays This is of particular benefit when working with non-model organisms, for which there may be genome sequence but frequently not extensive other genomic data. The existence of multiple computational CRM discovery methods leads to a familiar problem: with many software approaches, how do how do we know which ones perform the best? Given time and resource constraints, typically only a limited number of predicted regulatory elements from a given method can be validated empirically, and a comprehensive set of CRM

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call