Predicting the performance of automated crystallographic model-building pipelines.

Emad Alharbi,Radu Calinescu,Paul Bond,Kevin Cowtan

doi:10.1107/s2059798321010500

Emad Alharbi, Radu Calinescu + Show 2 more

Open Access

https://doi.org/10.1107/s2059798321010500

Copy DOI

Abstract

Proteins are macromolecules that perform essential biological functions which depend on their three-dimensional structure. Determining this structure involves complex laboratory and computational work. For the computational work, multiple software pipelines have been developed to build models of the protein structure from crystallographic data. Each of these pipelines performs differently depending on the characteristics of the electron-density map received as input. Identifying the best pipeline to use for a protein structure is difficult, as the pipeline performance differs significantly from one protein structure to another. As such, researchers often select pipelines that do not produce the best possible protein models from the available data. Here, a software tool is introduced which predicts key quality measures of the protein structures that a range of pipelines would generate if supplied with a given crystallographic data set. These measures are crystallographic quality-of-fit indicators based on included and withheld observations, and structure completeness. Extensive experiments carried out using over 2500 data sets show that the tool yields accurate predictions for both experimental phasing data sets (at resolutions between 1.2 and 4.0 Å) and molecular-replacement data sets (at resolutions between 1.0 and 3.5 Å). The tool can therefore provide a recommendation to the user concerning the pipelines that should be run in order to proceed most efficiently to a depositable model.

Highlights

The first protein structures were determined in the 1950s using X-ray crystallography (Kendrew et al, 1958)
mean absolute error (MAE) and root-mean-square error (RMSE) were calculated for the ML predictive model (P) and median predictor (M) used as a baseline (Zero-R) model
0.26) for predicting the protein structure completeness are higher than the MAE and RMSE for the other measures

Summary

Introduction

The first protein structures were determined in the 1950s using X-ray crystallography (Kendrew et al, 1958). By 2020, the number of solved protein structures deposited in the Protein Data Bank (PDB) exceeded 154 000 (Berman et al, 2000; https://www.rcsb.org/stats/summary) To enable this progress, researchers have automated the computational work of determining the protein structure from X-ray crystallographic data sets. The resolution of the experimental observations, the quality of experimental phasing or the similarity of the molecular-replacement model, and many other features such as ice rings may affect the quality of the data Each of these factors impact the performance of different model-building algorithms in different ways (Vollmar et al, 2020; Alharbi et al, 2019; Morris et al, 2004)

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Acta Crystallographica Section D Structural Biology	Publication Date: Nov 29, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Predicting the performance of automated crystallographic model-building pipelines.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Acta Crystallographica Section D Structural Biology

Lead the way for us

Similar Papers

Author response: Rapid protein stability prediction using deep learning representations
Lasse M Blaabjerg ... Lydia L Good
-
Lasse M Blaabjerg, et. al.Lasse M Blaabjerg ... Lydia L Good
09 May 2023
09 May 2023

Author response: Three-dimensional electron crystallography of protein microcrystals
Dan Shi ... Matthew G Iadanza
-
Dan Shi, et. al.Dan Shi ... Matthew G Iadanza
23 Sep 2013
23 Sep 2013

Pocketome via Comprehensive Identification and Classification of Ligand Binding Envelopes
Jianghong An ... Ruben Abagyan
Molecular & Cellular Proteomics | VOL. 4
Jianghong An, et. al.Jianghong An ... Ruben Abagyan
01 Jun 2005
Molecular & Cellular Proteomics | VOL. 4

Decision letter: Graphical-model framework for automated annotation of cell identities in dense cellular images
Ronald L Calabrese
-
Ronald L CalabreseRonald L Calabrese
24 Aug 2020
24 Aug 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting the performance of automated crystallographic model-building pipelines.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Acta Crystallographica Section D Structural Biology