Abstract

Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of “phylogenetic shadowing” methods using closely related comparative genomes, and the feasibility of high-resolution detection of small conserved features.

Highlights

  • Comparative genome sequence analysis is a powerful means of identifying functional DNA sequences by their evolutionary conservation [1,2,3]

  • A pilot sequencing effort is underway for a number of mammalian genomes to evaluate their utility for human genome analysis [4]

  • We find the minimum genome number N at which there exists any cutoff C that can satisfy specified false negative probability (FN) and false positive probability (FP) thresholds

Read more

Summary

Introduction

Comparative genome sequence analysis is a powerful means of identifying functional DNA sequences by their evolutionary conservation [1,2,3]. Single Nucleotide Resolution Requires Many Genomes The Cooper model concluded that for invariant conserved sites, sequencing comparative genomes to achieve a total branch length of five neutral substitutions per site would give single nucleotide resolution, with a FP of eÀ5 (0.006) [7].

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call