Abstract

BackgroundThe key idea of DNA barcode initiative is to identify, for each group of species belonging to different kingdoms of life, a short DNA sequence that can act as a true taxon barcode. DNA barcode represents a valuable type of information that can be integrated with ecological, genetic, and morphological data in order to obtain a more consistent taxonomy. Recent studies have shown that, for the animal kingdom, the mitochondrial gene cytochrome c oxidase I (COI), about 650 bp long, can be used as a barcode sequence for identification and taxonomic purposes of animals. In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our approach is based on the use of two compression-based versions of non-computable Universal Similarity Metric (USM) class of distances. Our purpose is to justify the employ of USM also for the analysis of short DNA barcode sequences, showing how USM is able to correctly extract taxonomic information among those kind of sequences.ResultsWe downloaded from Barcode of Life Data System (BOLD) database 30 datasets of barcode sequences belonging to different animal species. We built phylogenetic trees of every dataset, according to compression-based and classic evolutionary methods, and compared them in terms of topology preservation. In the experimental tests, we obtained scores with a percentage of similarity between evolutionary and compression-based trees between 80% and 100% for the most of datasets (94%). Moreover we carried out experimental tests using simulated barcode datasets composed of 100, 150, 200 and 500 sequences, each simulation replicated 25-fold. In this case, mean similarity scores between evolutionary and compression-based trees span between 83% and 99% for all simulated datasets.ConclusionsIn the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our approach is based on the use of two compression-based versions of non-computable Universal Similarity Metric (USM) class of distances. This way we demonstrate the reliability of compression-based methods even for the analysis of short barcode sequences. Compression-based methods, with their strong theoretical assumptions, may then represent a valid alignment-free and parameter-free approach for barcode studies.

Highlights

  • The key idea of DNA barcode initiative is to identify, for each group of species belonging to different kingdoms of life, a short DNA sequence that can act as a true taxon barcode

  • We considered as “good” those datasets having a low percentage of sequences with undefined bases and sequences of about the same length

  • For every pair of compression-based distances (NCD and Information-Based Distance (IBD)) and for every pair of phylogenetic tree inference algorithms (NJ and Unweighted Pair Group Method with Arithmetic Mean (UPGMA)), each table gives the similarity scores according to a reference evolutionary distance model (Kimura 2-parameter, Tamura-Nei and so on)

Read more

Summary

Introduction

The key idea of DNA barcode initiative is to identify, for each group of species belonging to different kingdoms of life, a short DNA sequence that can act as a true taxon barcode. DNA barcode represents a valuable type of information that can be integrated with ecological, genetic, and morphological data in order to obtain a more consistent taxonomy. In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. The use of DNA sequences in order to integrate ecological, morphological and genetic information to improve taxonomic studies of biological species [1] has been carried out since 2003 by Herbert et al [2]. Taxonomic studies with DNA barcoding data relies on traditional approaches, that consist of evaluating genetic distances among species in order to perform distancebased clustering analysis [12]. Called evolutionary distances, are stochastic estimates and they do not define a distance metric [13]

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.