Abstract

We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber.

Highlights

  • Even though every year biologists discover and classify thousands of new species, it is estimated that as many as 86% of existing species on Earth and 91% of species in the oceans have not yet PLOS ONE | DOI:10.1371/journal.pone.0119815 May 22, 2015

  • The Molecular Distance Maps we analyzed, of several different taxonomic subsets (phylum Vertebrata,kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia only, and order Primates), confirm that the presence or absence of oligomers in mtDNA sequences may contain information that is relevant to taxonomic classifications

  • In all computations of this paper we use k = 9. This image distance is highly sensitive and succeeds to successfully group hundreds of Chaos Game Representation (CGR) that are visually similar, such as the ones in Fig. 1(a) and Fig. 1(c), into correct taxonomic categories

Read more

Summary

Introduction

Even though every year biologists discover and classify thousands of new species, it is estimated that as many as 86% of existing species on Earth and 91% of species in the oceans have not yet PLOS ONE | DOI:10.1371/journal.pone.0119815 May 22, 2015. The position of the point pi in the map, relative to all the other points pj, reflects the distances between the DNA sequence si and the other DNA sequences sj in the dataset We apply this method to analyze and visualize several different taxonomic subsets of a dataset of 3,176 complete mtDNA sequences: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia only, and order Primates. We provide an interactive web tool, MoD Map (Molecular Distance Map), that allows an indepth exploration of all Molecular Distance Maps in the paper, complete with zoom-in features, search options, and accessible additional information for each sequence-representing point (called hereafter sequence-point) Overall, this method groups mtDNA sequences in correct taxonomic groups, from the kingdom level down to the order and family level. The appeal of this method lies in its simplicity, robustness, and generality, whereby exactly the same measuring tape can automatically yield meaningful measurements between non-specific DNA sequences of species as distant as those of the anatomically modern human and a cucumber, and as close as those of the anatomically modern human and the Neanderthal

Methods
Results and Discussion
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.