Abstract

BackgroundLarge-scale sequencing projects provide high-quality full-genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained on the basis of whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequence stored in a binary format demand an accurate and efficient computational approach for synteny block production.FindingshalSynteny performs efficient processing of pairwise alignment blocks for any pair of genomes in the alignment. The tool is part of the HAL comparative genomics suite and is targeted to build synteny blocks for multi-hundred–way, reference-free vertebrate alignments built with the Cactus system.ConclusionshalSynteny enables an accurate and rapid identification of synteny in multiple full-genome alignments. The method is implemented in C++11 as a component of the halTools software and released under MIT license. The package is available at https://github.com/ComparativeGenomicsToolkit/hal/.

Highlights

  • Conserved synteny blocks provide a conceptual framework for the analysis of interspecies homology

  • We describe a heuristical algorithm that operates on a pair of selected genome assemblies in the HAL multiple alignment

  • HalSynteny implements an algorithm for producing synteny blocks from genome alignment designed to process binary HAL files as input

Read more

Summary

Introduction

Conserved synteny blocks provide a conceptual framework for the analysis of interspecies homology. The GRIMM-Synteny [2] algorithm reconstructs an anchor graph from the predefined set of homologous hits shared by genomes, which can be local pairwise alignments or orthologous genes. CYNTE-NATOR [11] uses phylogenetic information and performs pro- (1) Initialize weight labels of vertices and edges: gressive alignment of the gene order among multiple genomes These tools all require various data formats, which must be value of the difference between start and end coordinates derived from the alignment, such as a predefined set of homoloin the target of query genomes, which is called the size of gous genomic markers, or genome alignment blocks, each being a sequence of aligned bases that is contiguous in each of the the corresponding alignment block.

Methods
Evaluation of results
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call