Abstract

BackgroundIdentification of homologous regions or conserved syntenies across genomes is one crucial step in comparative genomics. This task is usually performed by genome alignment softwares like WABA or blastz. In case of conserved syntenies, such regions are defined as conserved gene orders. On the gene order level, homologous regions can even be found between distantly related genomes, which do not align on the nucleotide sequence level.ResultsWe present a novel approach to identify regions of conserved synteny across multiple genomes. Syntenator represents genomes and alignments thereof as partial order graphs (POGs). These POGs are aligned by a dynamic programming approach employing a gene-specific scoring function. The scoring function reflects the level of protein sequence similarity for each possible gene pair. Our method consistently defines larger homologous regions in pairwise gene order alignments than nucleotide-level comparisons. Our method is superior to methods that work on predefined homology gene sets (as implemented in Blockfinder). Syntenator successfully reproduces 80% of the EnsEMBL man-mouse conserved syntenic blocks. The full potential of our method becomes visible by comparing remotely related genomes and multiple genomes. Gene order alignments potentially resolve up to 75% of the EnsEMBL 1:many orthology relations and 27% of the many:many orthology relations.ConclusionWe propose Syntenator as a software solution to reliably infer conserved syntenies among distantly related genomes. The software is available from .

Highlights

  • Identification of homologous regions or conserved syntenies across genomes is one crucial step in comparative genomics

  • Depending on the level of divergence, homologous regions are usually defined by conserved orders of local genomic alignments [3], orthologous exons [4] or genes [5]

  • We propose the Syntenator algorithm that facilitates multiple gene order alignments with a novel scoring function

Read more

Summary

Results

We applied both approaches to detect conserved syntenies in four mammalian species, namely human (NCBI 36), mouse (NCBI m36), rat (RGSC 3.4) and dog (CanFam 1.0). Pairwise genome comparison We compared the performance of gene order alignment approaches (Blockfinder and Syntenator) to the EnsEMBL compara synteny data set. Conserved syntenic regions are not necessarily completely covered by nucleotide level alignments This is the main reason why the Compara data set covers more genes. Light gray bars represent the proportion of human and mouse genes, which fall into regions that are covered by alignments Both number are the same for Syntenator as it considers all genes of a genome simultaneously. A good test set is not available to our knowledge and simulating whole genome evolution is beyond the scope of this manuscript

Background
B A G1 BC D
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.