Abstract

BackgroundCross-species whole-genome sequence alignment is a critical first step for genome comparative analyses, ranging from the detection of sequence variants to studies of chromosome evolution. Animal genomes are large and complex, and whole-genome alignment is a computationally intense process, requiring expensive high-performance computing systems due to the need to explore extensive local alignments. With hundreds of sequenced animal genomes available from multiple projects, there is an increasing demand for genome comparative analyses.ResultsHere, we introduce G-Anchor, a new, fast, and efficient pipeline that uses a strictly limited but highly effective set of local sequence alignments to anchor (or map) an animal genome to another species’ reference genome. G-Anchor makes novel use of a databank of highly conserved DNA sequence elements. We demonstrate how these elements may be aligned to a pair of genomes, creating anchors. These anchors enable the rapid mapping of scaffolds from a de novo assembled genome to chromosome assemblies of a reference species. Our results demonstrate that G-Anchor can successfully anchor a vertebrate genome onto a phylogenetically related reference species genome using a desktop or laptop computer within a few hours and with comparable accuracy to that achieved by a highly accurate whole-genome alignment tool such as LASTZ. G-Anchor thus makes whole-genome comparisons accessible to researchers with limited computational resources.ConclusionsG-Anchor is a ready-to-use tool for anchoring a pair of vertebrate genomes. It may be used with large genomes that contain a significant fraction of evolutionally conserved DNA sequences and that are not highly repetitive, polypoid, or excessively fragmented. G-Anchor is not a substitute for whole-genome aligning software but can be used for fast and accurate initial genome comparisons.G-Anchor is freely available and a ready-to-use tool for the pairwise comparison of two genomes.

Highlights

  • Accurate alignment of 2 or more genomes is an important step for applications such as annotating a de novo sequenced and assembled genome, performing cross-species genome evolutionary studies, reconstructing ancestral genomes [1,2,3], and detecting variations and genes under selection within a species [4]

  • Our results demonstrate that G-Anchor can successfully anchor a vertebrate genome onto a phylogenetically related reference species genome using a desktop or laptop computer within a few hours and with comparable accuracy to that achieved by a highly accurate whole-genome alignment tool such as LASTZ

  • Hundreds more genomes are currently being sequenced by the Genome 10K community [7], other international consortia, and individual groups [8], [9]. Many of these genomes are being included in bulk annotations produced by large genomic centers, and multiple whole-genome alignments are publically available from centralized databases such as Ensembl and the University of California, Santa Cruz (UCSC) Genome Browser [10,11]

Read more

Summary

Introduction

Accurate alignment of 2 or more genomes is an important step for applications such as annotating a de novo sequenced and assembled genome, performing cross-species genome evolutionary studies, reconstructing ancestral genomes [1,2,3], and detecting variations and genes under selection within a species [4]. Despite the fact that the Mammalian datasets (split or not) were built by using the same multiple alignments (but included much more species), the large number of HCE that were aligned onto the cattle genome allowed the intersecting fraction with the LASTZ-based method results to reach a higher level (see Additional File 1, Supplementary Fig. S1). After aligning to the ga-target (mallard) and filtering, as described in G-Anchor’s stage 3, roughly 950 000 HCE aligned to the reference genome in unique positions with a 59-bp median length and 9% genome coverage, setting the HCE anchors (Additional File, Supplementary Table S2).The G-Anchor pipeline managed to map a little bit less than 90% of the mallard genome’s scaffolds compared to the LASTZ-based alignments, covering 96% of LASTZ alignment blocks’ length (Additional File, Supplementary Table S3). To optimize the performance of BLAT in the preprocessing step, it is only possible to use the -ooc parameter

Discussion
Availability of supporting data
Mapping Inconsistencies Figure S2
G-Anchor pipeline in Human-Mouse comparison Table S4
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call