Abstract

Proteins often work as oligomers or multimers in vivo. Therefore, elucidating their oligomeric or multimeric form (quaternary structure) is crucially important to ascertain their function. X-ray crystal structures of numerous proteins have been accumulated, providing information related to their biological units. Extracting information of biological units from protein crystal structures represents a meaningful task for modern biology. Nevertheless, although many methods have been proposed for identifying biological units appearing in protein crystal structures, it is difficult to distinguish biological protein–protein interfaces from crystallographic ones. Therefore, our simple but highly accurate classifier was developed to infer biological units in protein crystal structures using large amounts of protein sequence information and a modern contact prediction method to exploit covariation signals (CSs) in proteins. We demonstrate that our proposed method is promising even for weak signals of biological interfaces. We also discuss the relation between classification accuracy and conservation of biological units, and illustrate how the selection of sequences included in multiple sequence alignments as sources for obtaining CSs affects the results. With increased amounts of sequence data, the proposed method is expected to become increasingly useful.

Highlights

  • In recent decades, various methods and analyses have been reported for identifying biological units in protein crystal structures

  • Protein Sparse InverseCOVariance (PSICOV)[24] filters out non-promising multiple sequence alignments (MSAs) with the diversity criterion: MSA must contain at least as many non-redundant sequence clusters as the query length

  • We compared the numbers of sequences in the MSAs, which can pass the PSICOV criterion, generated using databases of 2011 and 2016

Read more

Summary

Introduction

Various methods and analyses have been reported for identifying biological units in protein crystal structures. A few buried residues, comprising the so-called “core”, are more conserved than surrounding interface residues[16] Work in this area was improved further with a new classifier, EPPIC, which uses the Shannon entropy ratio based only on fairly similar homologous sequences. EPPIC performed well with the new difficult dataset[17] Some parameters such as contact size and geometric complementarity work to some degree, each parameter alone is not always sufficient for interface classification. We applied contact prediction for the interface classification problem in protein crystals where actual contacts are already given, which demonstrated that using features based on CSs is promising for this field of study. This study elucidates differences between the contact prediction of intrachain and that of the interaction interface

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.