Abstract

We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes.

Highlights

  • Knowledge of genome organization and of its dynamics is an important question in several fields such as cancer genomics [1,2,3], to understand gene interactions involved in a common molecular pathway [4], or evolutionary biology, for example to establish a species phylogeny by comparative analysis of gene orders [5]

  • The first one is composed of 7 tetrapod species with only universal unicopy genes, and aims at comparing our method with the method of Aganezov et al [29]; on this data set, we obtain comparable results

  • Ancestral genes may be seen as contigs, and finding an order between contigs is a similar problem in both extant and ancestral genomes

Read more

Summary

Introduction

Knowledge of genome organization (gene content and order) and of its dynamics is an important question in several fields such as cancer genomics [1,2,3], to understand gene interactions involved in a common molecular pathway [4], or evolutionary biology, for example to establish a species phylogeny by comparative analysis of gene orders [5]. On a reduced data set of 39 whole mammalian genomes, we tested the precision and sensitivity of the scaffolding performed by ARt-DeCo by simulating artificial fragmentation of the human or horse genomes, removing up to 75% of the known gene neighborhoods of these well assembled genomes, and comparing the removed adjacencies with the ones proposed by ARt-DeCo. We measure a >95% precision, while sensitivity, as expected, depends on the quality of the data and on the presence of closely related extant genomes. We measure a >95% precision, while sensitivity, as expected, depends on the quality of the data and on the presence of closely related extant genomes This denotes the domain of efficiency of our method: a vast majority of proposed adjacencies can be considered with confidence, but the final resulting scaffolding is still incomplete.

A Dynamic programming scheme
Results
Discussion and Conclusion
33. Kasprzyk A: BioMart
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call