Abstract

BackgroundPerfectly or highly conserved DNA elements were found in vertebrates, invertebrates, and plants by various methods. However, little is known about such elements in protists. The evolutionary distance between apicomplexans can be very high, in particular, due to the positive selection pressure on them. This complicates the identification of highly conserved elements in alveolates, which is overcome by the proposed algorithm.ResultsA novel algorithm is developed to identify highly conserved DNA elements. It is based on the identification of dense subgraphs in a specially built multipartite graph (whose parts correspond to genomes). Specifically, the algorithm does not rely on genome alignments, nor pre-identified perfectly conserved elements; instead, it performs a fast search for pairs of words (in different genomes) of maximum length with the difference below the specified edit distance. Such pair defines an edge whose weight equals the maximum (or total) length of words assigned to its ends. The graph composed of these edges is then compacted by merging some of its edges and vertices. The dense subgraphs are identified by a cellular automaton-like algorithm; each subgraph defines a cluster composed of similar inextensible words from different genomes. Almost all clusters are considered as predicted highly conserved elements. The algorithm is applied to the nuclear genomes of the superphylum Alveolata, and the corresponding phylogenetic tree is built and discussed.ConclusionWe proposed an algorithm for the identification of highly conserved elements. The multitude of identified elements was used to infer the phylogeny of Alveolata.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1257-5) contains supplementary material, which is available to authorized users.

Highlights

  • Or highly conserved DNA elements were found in vertebrates, invertebrates, and plants by various methods

  • The longest (6.99 Mbp) chromosome of Neospora caninum was collated in turn with three well-assembled full genomes: Babesia microti of 4 chromosomes (6.39 Mbp in total), Cryptosporidium parvum of 8 chromosomes (9.1 Mbp), and Plasmodium falciparum of 16

  • We presented a novel algorithm to identify highly conserved DNA elements; it was applied to the superphylum Alveolata

Read more

Summary

Introduction

Or highly conserved DNA elements were found in vertebrates, invertebrates, and plants by various methods. The evolutionary distance between apicomplexans can be very high, in particular, due to the positive selection pressure on them This complicates the identification of highly conserved elements in alveolates, which is overcome by the proposed algorithm. Introduction Ultraconserved elements (UCEs) are perfectly conserved regions of genomes shared among evolutionary distant taxa. It is assumed that these regions are identical in closely related species and have minor differences in relatively distant ones, which substantially limits the phylogenetic distances. Hundreds of conserved noncoding sequences were detected in four dicotyledonous plant species: Arabidopsis thaliana, Carica papaya, Populus trichocarpa, and Vitis vinifera [3]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.