Knowledge of protein structures is essential to understand proteins' functions, evolution, dynamics, stabilities, and interactions and for data-driven protein- or drug design. Yet, experimental structure determination rates are far exceeded by that of next-generation sequencing, resulting in less than 1/1000th of proteins having an experimentally known 3D structure. Computational structure prediction seeks to alleviate this problem, and the Critical Assessment of Protein Structure Prediction (CASP) has shown the value of consensus and meta-methods that utilize complementary algorithms. However, traditionally, such methods employ majority voting during template selection and model averaging during refinement, which can drive the model away from the native fold if it is underrepresented in the ensemble. Here, we present TopModel, a fully automated meta-method for protein structure prediction. In contrast to traditional consensus and meta-methods, TopModel uses top-down consensus and deep neural networks to select templates and identify and correct wrongly modeled regions. TopModel combines a broad range of state-of-the-art methods for threading, alignment, and model quality estimation and provides a versatile workflow and toolbox for template-based structure prediction. TopModel shows a superior template selection, alignment accuracy, and model quality for template-based structure prediction on the CASP10-12 datasets compared to 12 state-of-the-art stand-alone primary predictors. TopModel was validated by prospective predictions of the nisin resistance protein (NSR) protein from Streptococcus agalactiae and LipoP from Clostridium difficile, showing far better agreement with experimental data than any of its constituent primary predictors. These results, in general, demonstrate the utility of TopModel for protein structure prediction and, in particular, show how combining computational structure prediction with sparse or low-resolution experimental data can improve the final model.
Read full abstract