Ab initio modeling of small proteins by iterative TASSER simulations

Sitao Wu,Jeffrey Skolnick,Yang Zhang

doi:10.1186/1741-7007-5-17

Sitao Wu, Jeffrey Skolnick + Show 1 more

Open Access

https://doi.org/10.1186/1741-7007-5-17

Copy DOI

Abstract

BackgroundPredicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins.ResultsWe developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Cα-root mean square deviation (RMSD) of 3.8Å, with 6 of them having a Cα-RMSD < 2.5Å. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Cα-RMSD < 2.5Å. The average Cα-RMSD of the I-TASSER models was 3.9Å, whereas it was 5.9Å using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Cα-RMSD of 3.9Å was obtained for the third benchmark, with seven cases having a Cα-RMSD < 2.5Å.ConclusionOur simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users .

Highlights

Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology
If a template could be detected by the Position Specific Iterative (PSI)-BLAST program with an E-value < 0.05, it would be excluded
We note that the homology exclusion cutoff used here is more stringent than that used by Bradley et al [13], who only excluded templates with a PSI-BLAST E-value < 0.05 but without sequence identity cutoff, and that used by Zhang et al [12], who only excluded the templates with sequence identity > 30% but without PSI-BLAST checking

Summary

Introduction

Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. For sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. If templates are absent from the Protein Data Bank (PDB) library, the models need to be built from scratch, i.e. ab initio folding. This is the most difficult category of protein-structure prediction [16,17]

Objectives

Methods

Results

Conclusion