Abstract

The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.

Highlights

  • Macroscopic organisms, such as animals, plants and fungi, are generally easy to distinguish for species classification by an abundance of morphological differences, behavioral traits, or by interbreeding barriers

  • To compare the results obtained with genome BLAST distance phylogenies (GBDP) to those obtained with the Average Nucleotide Identity' (ANI) and 'Percentage Conserved DNA' methods as reported by Goris et al [17], we reduced the dataset to the genome pairs examined in the latter study

  • The overall performance of BLAT could not be increased further by setting the minimum sequence identity required within high-scoring segment pairs (HSPs) and the corresponding minimum scores to their minimal possible value ('blatmin' and 'blatminNF'); combined with this setting, formula (3) and its logarithmic modification performed best, achieving a correlation of -0.763

Read more

Summary

Introduction

Macroscopic organisms, such as animals, plants and fungi, are generally easy to distinguish for species classification by an abundance of morphological differences, behavioral traits, or by interbreeding barriers. For microorganisms belonging to the two ‘prokaryotic’ domains of life, Archaea and Bacteria [1], species delineation is a much more challenging task. Morphological features and metabolic peculiarities can be used to classify microorganisms to a certain degree of confidence, but the number of features and peculiarities that can be recognized for differentiation is rather limited. Consideration of genetic – and nowadays increasingly genomic – features often enables a deeper resolution for the differentiation, placing DNA-DNA hybridizations (DDH) in a key position as a major tool in microbial species delineation [24].

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call