Abstract

Phylogenetic methods are key to providing models for how a given protein family evolved. However, these methods run into difficulties when sequence divergence is either too low or too high. Here, we provide a case study of Hox and ParaHox proteins so that additional insights can be gained using a new computational approach to help solve old classification problems. For two (Gsx and Cdx) out of three ParaHox proteins the assignments differ between the currently most established view and four alternative scenarios. We use a non-phylogenetic, pairwise-sequence-similarity-based method to assess which of the previous predictions, if any, are best supported by the sequence-similarity relationships between Hox and ParaHox proteins. The overall sequence-similarities show Gsx to be most similar to Hox2–3, and Cdx to be most similar to Hox4–8. The results indicate that a purely pairwise-sequence-similarity-based approach can provide additional information not only when phylogenetic inference methods have insufficient information to provide reliable classifications (as was shown previously for central Hox proteins), but also when the sequence variation is so high that the resulting phylogenetic reconstructions are likely plagued by long-branch-attraction artifacts.

Highlights

  • One key feature of classifying protein sequences is that the classification provides the best surrogate measure for predicting likely functional properties of novel protein sequences by comparing and transferring information from better described proteins in closely related groups or clades

  • As there is not much information available on how precisely ParaHox proteins carry out their molecular functions and what sequence elements precisely are relevant to their function, we decided to exclude potential user-induced (Hox-research focused scientists) biases and analyze the full-length protein sequences rather than focusing on specific domains or subsections of these proteins

  • Our basic approach to identifying Hox/ParaHox proteins is likely to have ensured a full inclusion of all Hox and ParaHox protein sequences as we retrieved plant TALE homeodomain sequences as well as Pax sequences, which are well beyond the stated scope of this study

Read more

Summary

Introduction

One key feature of classifying protein sequences is that the classification provides the best surrogate measure for predicting likely functional properties of novel protein sequences by comparing and transferring information from better described proteins in closely related groups or clades. A very rudimentary prediction of protein function can be based on comparisons of domain compositions (e.g., [1]); all homeodomain-containing proteins are presumed to, at least, have the ability to bind to DNA. For those proteins with highly similar domain compositions, additional sequence features can be subsequently employed to refine the classification and prediction of protein function (e.g., see [2]). Such additional features might be the complete amino-acid sequence of a protein or, if additional structural knowledge is available, a focused

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call