Ranking near-native candidate protein structures via random forest classification

Hongjie Wu,Haiou Li,Yijie Ding,Weizhong Lu,Hongmei Huang,Qiming Fu,Jing Qiu

doi:10.1186/s12859-019-3257-8

Hongjie Wu, Haiou Li + Show 5 more

Open Access

https://doi.org/10.1186/s12859-019-3257-8

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2019
Citations: 6	License type: open-access

Affiliation: Suzhou University of Science and Technology

Abstract

BackgroundIn ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult.ResultsTo address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal.ConclusionsIn this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.

Highlights

In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys
Comparison of the three clustering methods with random forest classification We evaluated the ability of the method to identify near-native structures relative to that of previous methods according to clustering methodology
Fifteen of the models predicted by the random forest classifier were closer to the native structure than those predicted by SPICKER, 22 were the same, and six were with higher root-mean-square deviation (RMSD), resulting in a 21% increase in accuracy

Summary

Results

Datasets Four datasets are employed in the experiments. They are I-TASSER Decoy Set-I, QUARK Decoy Set, CASP10 dataset and CASP11 dataset which are generated by ITASSER and QUARK (https://zhanglab.ccmb.med. umich.edu/decoys/). Fifteen of the models predicted by the random forest classifier were closer to the native structure than those predicted by SPICKER, 22 were the same, and six were with higher RMSDs, resulting in a 21% increase in accuracy. Eleven of the models predicted by the random forest classifier were closer to the native structure than those predicted by Calibur, 19 were the same, and 13 were worse, resulting in a 4% increase in accuracy. Sixteen of the models predicted by the random forest classifier were closer to the native structure than those predicted by Durandal, 19 were the same, and eight were worse, resulting in a 18% increase in accuracy. Sixteen of the models predicted by the random forest classifier were closer to the native structure than those predicted by SPICKER, 17 were the same, and ten were worse, resulting in a 14% increase in accuracy. In Calibur and Durandal model comparison, RF_Calibur model (1kjs, RMSD 5.89) and RF_Durandal model (1kjs, RMSD 5.92) successfully built the short helix rather than Calibur model (1kjs, RMSD 8.44) and Durandal model (1kjs, RMSD 8.74) and well aligned with the native model

Conclusions

Background

Method

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ranking near-native candidate protein structures via random forest classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity.
Xu Han ... Li Li
Genes | VOL. 10
Xu Han, et. al.Xu Han ... Li Li
11 Feb 2019
Genes | VOL. 10

Selecting near‐native protein structures from ab initio models using ensemble clustering
Li Li ... Yonggang Lu
Quantitative Biology | VOL. 6
Li Li, et. al.Li Li ... Yonggang Lu
01 Dec 2018
Quantitative Biology | VOL. 6

Efficient identification of near‐native conformations in ab initio protein structure prediction using structural profiles
Katrin Wolff ... Markus Porto
Proteins: Structure, Function, and Bioinformatics | VOL. 78
Katrin Wolff, et. al.Katrin Wolff ... Markus Porto
21 Aug 2009
Proteins: Structure, Function, and Bioinformatics | VOL. 78

A distance-dependent atomic knowledge-based potential for improved protein structure selection.
Hui Lu ... Jeffrey Skolnick
Proteins: Structure, Function, and Bioinformatics | VOL. 44
Hui Lu, et. al.Hui Lu ... Jeffrey Skolnick
21 Jun 2001
Proteins: Structure, Function, and Bioinformatics | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ranking near-native candidate protein structures via random forest classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics