Abstract

A configuration space of homologous protein sequences (or CSHP) has been recently constructed based on pairwise comparisons, with probabilities deduced from Z-value statistics (Monte Carlo methods applied to pairwise comparisons) and following evolutionary assumptions. A Z-value cut-off is applied so as proteins are placed in the CSHP only when the similarity of pairs of sequences is significant following the Theorem of the Upper Limit of a score Probability (TULIP theorem). Based on the positions of similar protein sequences in the CSHP, a classification can be deduced, which can be visualized as trees, called TULIP trees. In previous case studies, TULIP trees where shown to be consistent with phylogenetic trees. To date, no tool has been made available to allow the computation of TULIP trees following this model. The availability of methods to cluster proteins based on pairwise comparisons and following evolutionary assumptions should be useful for evaluation and for the future improvements they might inspire. We developed a web server allowing the local or online computation of TULIP trees based on the CSHP probabilities. The input is a set of homologous protein sequences in multi-FASTA format. Pairwise comparisons are conducted using the Smith-Waterman method, with 100-1,000 sequence shuffling to estimate pairwise Z-values. Obtained Z-value matrix is used to infer a tree which is then written to a file. Output consists therefore of a Z-value matrix, a distance matrix, a TULIP treefile in NEWICK format, and a TULIP tree visualisation. The TULIP server provides an easy-to-use interface to the TULIP software, and allows a classification of protein sequences based on pairwise alignments and following evolutionary assumptions. TULIP trees are consistent with phylogenies in numerous cases, but they can be inconsistent for multi-domain proteins in which some domains have been conserved in all branches. Thus TULIP trees cannot be considered as conventional phylogenetic trees, following the MIAPA (Minimum Information About a Phylogenetic Analysis) recommendations. A major strength of the TULIP classification is its statistical validity when analysing samples including compositionally unbiased and biased sequences (i.e. with biased amino acid distributions), like sequences from Plasmodium falciparum. The TULIP web server is a service of the Malaria Portal of the University of Pretoria, South Africa, and is available at http://malport.bi.up.ac.za/TULIP/.

Highlights

  • Evolutionary analysis of genes or proteins is based on sequence comparisons

  • To explore the potential of pairwise alignment-based’ (PAB) approaches to classify proteins following evolutionary assumptions [2], we designed a spatial representation of protein sequences, with probabilities deduced from Z-value statistics (Monte Carlo methods applied to pairwise comparisons) [23]

  • The TULIP server was initially developed to allow the comparative analyses of proteins including sequences of Plasmodium falciparum, the malaria causative agent, which are atypical due to their strong amino acid compositional bias, low complexity and being 20% longer than their homologues

Read more

Summary

Introduction

Evolutionary analysis of genes or proteins is based on sequence comparisons. Since Felsenstein introduced the PHYLogeny Inference Package (PHYLIP) in the 1980’s [1], phylogeny is classically predicted based on multiple sequence alignments. Recent use of PAB classification for an automatic inference of phylogeny includes OrthoMCL [15], based on pairwise BLAST comparisons and the computation of evolutionary distance based on E-value statistics (for review, [12]). To explore the potential of PAB approaches to classify proteins following evolutionary assumptions [2], we designed a spatial representation of protein sequences (the Configuration Space of Homologous Proteins or CSHP), with probabilities deduced from Z-value statistics (Monte Carlo methods applied to pairwise comparisons) [23].

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call