Abstract

BackgroundThere has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes?ResultsOur method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera.ConclusionsThe algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.

Highlights

  • There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome

  • We have found in extensive simulations that if all unsequenced genes were originally located in regions that are gaps after the sequencing and assembly are finished, the distances D and D are identical, or almost so, over a wide range of genome sizes, rearrangement distances and missing gene sets

  • We calculate statistics about how the missing genes are distributed in Vitis, as singletons, pairs, triples or longer runs

Read more

Summary

Introduction

There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. How can we use rearrangement algorithms to compare genomes available in scaffold form only? The dramatic drop in the expense of genome sequencing has two somewhat contradictory effects on the study of gene order. On one hand it greatly increases the range of organisms available for genomic analysis, including comparative studies and phylogenomics. Items whose chromosomal location is unknown cannot be part of the input. This puts the many draft genomes outside the scope of currently available comparison technology, even though these data may be suitable to other goals of genomics

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call