Abstract

The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

Highlights

  • Introduction of alignment errorAs the simulated multiple sequence alignments do not contain alignment induced error, an additional test was performed to introduce realistic alignment errors encountered in real multiple sequence alignments

  • The method uses the BLOSUM62 matrix of amino acid substitution to make small numbers of changes to the sequences to identify and discard weekly supported bipartitions in the tree. We propose that this method, which uses widely-used existing tools for sequence analysis, will provide a platform for improving and informing multiple aspects of downstream bioinformatic analysis including multiple sequence alignment generation and phylogenetic tree inference

  • Each of the 308 simulated alignments was subject to realignment as above and parsed using GBLOCKS with options configured for conservative data selection

Read more

Summary

Introduction

Introduction of alignment errorAs the simulated multiple sequence alignments do not contain alignment induced error, an additional test was performed to introduce realistic alignment errors encountered in real multiple sequence alignments. Due negative effects which can be incurred by the inclusion of gap characters and mis-aligned data on phylogenetic inference a common approach is to discarded ‘‘gappy’’ information Popular methods such as GBLOCKS [34] have been developed to automate this process and thereby reduce the amount of possibly mis-aligned data from multiple sequence alignments. In the case of the above experiment, DendroBLAST was compared to other inference methods using simulated multiple sequence alignments with addition of alignment induced error It is common in phylogenetic analysis for alignments to be subject to trimming before use. In all cases trimming the re-aligned multiple sequence alignments resulted in reduction of inference performance (Table 3) using alignment based methods This effect was more pronounced on the alignments which contained higher error rates (Table 3). Errors in the multiple sequence alignment directly contribute to errors in phylogenetic trees [4,5,6,7]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.