Abstract

BackgroundIdentification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper presents a pattern mining approach to tackle this problem. It is observed that a functional region of protein structures usually consists of several peptide segments linked with large wildcard regions. Thus, the proposed mining technology considers large irregular gaps when growing patterns, in order to find the residues that are simultaneously conserved but largely separated on the sequences. A derived pattern is called a cluster-like pattern since the discovered conserved residues are always grouped into several blocks, which each corresponds to a local conserved region on the protein sequence.ResultsThe experiments conducted in this work demonstrate that the derived long patterns automatically discover the important residues that form one or several hot regions of protein-protein interactions. The methodology is evaluated by conducting experiments on the web server MAGIIC-PRO based on a well known benchmark containing 220 protein chains from 72 distinct complexes. Among the tested 218 proteins, there are 900 sequential blocks discovered, 4.25 blocks per protein chain on average. About 92% of the derived blocks are observed to be clustered in space with at least one of the other blocks, and about 66% of the blocks are found to be near the interface of protein-protein interactions. It is summarized that for about 83% of the tested proteins, at least two interacting blocks can be discovered by this approach.ConclusionThis work aims to demonstrate that the important residues associated with the interface of protein-protein interactions may be automatically discovered by sequential pattern mining. The detected regions possess high conservation and thus are considered as the computational hot regions. This information would be useful to characterizing protein sequences, predicting protein function, finding potential partners, and facilitating protein docking for drug discovery.

Highlights

  • Identification of protein interacting sites is an important task in computational molecular biology

  • Using the five proteins in the first dataset, we investigate the potential of sequential pattern mining in identifying hot regions of protein-protein interactions by examining carefully the discovered patterns

  • Among the 220 protein chains in the second dataset, two protein chains are excluded from the test set because the protein sequence of the protein chain [Protein Data Bank (PDB):1ml0, chain B] is not available in the PDB file and the protein chain [PDB:1m10, chain A] does not have enough homologues for pattern mining (< 5 homologues)

Read more

Summary

Introduction

Identification of protein interacting sites is an important task in computational molecular biology. Identification of functionally important regions directly from a protein sequence is a challenging problem in molecular biology [1,2,3,4,5,6,7]. Using the alanine scanning mutagenesis [18], which estimates the energetic contribution of individual side-chains, it suggests that a small set of interface residues can contribute the most to the binding free energy [15,16,19] These critical residues are called hot spots; they give rise to a significant increase in the absolute binding energy when mutated to alanine [15,16,20]. It is interestingly observed that hot spots are not uniformly spread along the interfaces Instead, they are clustered as densely packed regions and are surrounded by energetically less important residues which might serve to occlude bulk solvent from the hot spots [15]. The hot spots and some moderately conserved residues both contribute to the stability of the complex [17]

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.