Abstract

BackgroundThe study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. One way to address this question is the use of clustering methods for protein sequence data and approaches to predict functional residues based on such clusterings. The locations of putative functional residues in known protein structures provide insights into how different substrate specificities are reflected on the protein structure level.ResultsWe have developed an extension of the context-specific independence mixture model clustering framework which allows for the integration of experimental data. As these are usually known only for a few proteins, our algorithm implements a partially-supervised learning approach. We discover domain subfamilies and predict functional residues for four protein domain families: phosphatases, pyridoxal dependent decarboxylases, WW and SH3 domains to demonstrate the usefulness of our approach.ConclusionThe partially-supervised clustering revealed biologically meaningful subfamilies even for highly heterogeneous domains and the predicted functional residues provide insights into the basis of the different substrate specificities.

Highlights

  • The study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains

  • Many previous studies have focused on the question of how to find the functional residues for a given protein family when proteins already have been assigned to subfamilies

  • These methods include approaches based on information-theoretical measures such as relative entropy [1,2] or mutual information [3], template-based similarity scores to known functional residues [4], approaches which contrast position-specific conservation in orthologues and paralogues [5] or superfamilies [6] and comparisons to known reference 3D structures to find discriminatory surface residues [7]

Read more

Summary

Introduction

The study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. Many previous studies have focused on the question of how to find the functional residues for a given protein family when proteins already have been assigned to subfamilies These methods include approaches based on information-theoretical measures such as relative entropy [1,2] or mutual information [3], template-based similarity scores to known functional residues [4], approaches which contrast position-specific conservation in orthologues and paralogues [5] or superfamilies [6] and comparisons to known reference 3D structures to find discriminatory surface residues [7]. Any tree-based approach is assured to fail Such data sets for instance arise from detailed molecular studies, where individual proteins are examined for their substrate specificity [15,16] and each experiment requires considerable effort. In [18], we presented a method based on the context-specific independence (CSI) mixture framework for clustering of protein sequences and simultaneous prediction of functional residues

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call