Improving the chances of successful protein structure determination with a random forest classifier.

Samad Jahandideh,Adam Godzik,Lukasz Jaroszewski

doi:10.1107/s1399004713032070

Abstract

Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely used XtalPred algorithm [Slabinski et al. (2007), Protein Sci. 16, 2472-2482] was developed. XtalPred classifies proteins into five `crystallization classes' based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The new XtalPred-RF (random forest) achieves significant improvement of the prediction of crystallization success over the original XtalPred. To illustrate this, XtalPred-RF was tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e. twofold) for the top class of predicted targets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving the chances of successful protein structure determination with a random forest classifier.

Abstract

Talk to us

Similar Papers

More From: Acta Crystallographica Section D Biological Crystallography

Lead the way for us

Journal: Acta Crystallographica Section D Biological Crystallography	Publication Date: Feb 15, 2014
Citations: 49

Similar Papers

Crystal structure of a tandem cystathionine‐β‐synthase (CBS) domain protein (TM0935) from Thermotoga maritima at 1.87 Å resolution
Mitchell D Miller ...
Proteins: Structure, Function, and Bioinformatics | VOL. 57
Mitchell D Miller, et. al.Mitchell D Miller ...
08 Jul 2004
Proteins: Structure, Function, and Bioinformatics | VOL. 57

Crystal structure of acireductone dioxygenase (ARD) from Mus musculus at 2.06 Å resolution
Qingping Xu ...
Proteins: Structure, Function, and Bioinformatics | VOL. 64
Qingping Xu, et. al.Qingping Xu ...
16 Jun 2006
Proteins: Structure, Function, and Bioinformatics | VOL. 64

Crystal structure of 2‐phosphosulfolactate phosphatase (ComB) from Clostridium acetobutylicum at 2.6 Å resolution reveals a new fold with a novel active site
Michael Didonato ...
Proteins: Structure, Function, and Bioinformatics | VOL. 65
Michael Didonato, et. al.Michael Didonato ...
22 Aug 2006
Proteins: Structure, Function, and Bioinformatics | VOL. 65

The Joint Center for Structural Genomics: exploration of the human gut microbiome
Ian Wilson
Genome Biology | VOL. 12
Ian WilsonIan Wilson
01 Jan 2010
Genome Biology | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving the chances of successful protein structure determination with a random forest classifier.

Abstract

Talk to us

Similar Papers

More From: Acta Crystallographica Section D Biological Crystallography