Abstract

Protein structure prediction is one of the most important problems in Computational Biology; and consists of determining the 3D structure of a protein given its amino acid sequence. A key component that has allowed considerable improvements in recent decades is the prediction of contacts in a protein, since it provides fundamental information about its three-dimensional structure. In the 13th edition of the CASP (Critical Assessment of protein Structure Prediction), a notable progress has been evidenced for both problems with the use of deep learning algorithms. For the contact prediction category, the best methods in CASP13 achieved an average precision of 70%. In the present work, the performance of these methods is analyzed using a larger data set, with 483 proteins from four families according to the structural classification of the SCOP database (Structural Classification of Proteins). The selected methods were evaluated using the CASP metrics, and their results indicate an average contact prediction precision greater than 90%. SPOT-Contact was the method with the best overall performance, and one of the methods with the best performance for each SCOP class. The set of proteins used for the experiments and the implementations made for the analysis are publicly available.

Highlights

  • Proteins are one of the most biologically important macromolecules and have a wide variety of functions

  • Due to the fact that these measures are highly correlated in the reduced lists of contacts, we focus the analysis on the precision because it is the most intuitive measure and, considering the application of contact prediction to the protein structure prediction, it is important to keep F P as low as possible

  • Statistical tests were applied on the precisions for the complete protein data set, as well as on the precisions for the protein sets classified according to SCOP 1.75

Read more

Summary

Introduction

Proteins are one of the most biologically important macromolecules and have a wide variety of functions. The atoms of the residues in contact are considered to have direct interactions within the protein; and two residues are defined to be in contact if the Euclidean distance between their Cβ atoms (Cα in the case of glycine) is less than 8 ̊A (angstroms) [5]. The input for this subproblem is the sequence of L residues of the protein, and the output is a symmetric L×L matrix called contact map, which represents the contacts between all their pairs of residues. It is common to present the results in the form of a contact list, where each line

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call