Biology in silico – a mixed bag : Computational Methods in Molecular Biology (New Comprehensive Biochemistry Vol. 32) edited by S. L. Salzberg, D. B. Searls and S. Kasif

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Biology in silico – a mixed bag : Computational Methods in Molecular Biology (New Comprehensive Biochemistry Vol. 32) edited by S. L. Salzberg, D. B. Searls and S. Kasif

Similar Papers
  • Research Article
  • 10.1142/s021800142250015x
Protein Structure Prediction Using Quantile Dragonfly and Structural Class-Based Deep Learning
  • Mar 14, 2022
  • International Journal of Pattern Recognition and Artificial Intelligence
  • Varanavasi Nallasamy + 1 more

Predicting three-dimensional structure of a protein in the field of computational molecular biology has received greater attention. Most of the recent research works aimed at exploring search space, however with the increasing nature and size of data, protein structure identification and prediction are still in the preliminary stage. This work is aimed at exploring search space to tackle protein structure prediction with minimum execution time and maximum accuracy by means of quantile regressive dragonfly and structural class homolog-based deep learning (QRD-SCHDL). The proposed QRD-SCHDL method consists of two distinct steps. They are protein structure identification and prediction. In the first step, protein structure identification is performed by means of QRD optimization model to identify protein structure with minimum error. Here the protein structure identification is first performed as the raw database contains sequence information and does not contain structural information. An optimization model is designed to obtain the structural information from the database. However, protein structure gives much more insight than its sequence. Therefore, to perform computational prediction of protein structure from its sequence, actual protein structure prediction is made. The second step involves the actual protein structure prediction via structural class and homolog-based deep learning. For each protein structure prediction, a scoring matrix is obtained by utilizing structural class maximum correlation coefficient. Finally, the proposed method is tested on a set of different unique numbers of protein data and compared to the state-of-the-art methods. The obtained results showed the potentiality of the proposed method in terms of metrics, error rate, protein structure prediction time, protein structure prediction accuracy, precision, specificity, recall, ROC, Kappa coefficient and [Formula: see text]-measure, respectively. It also shows that the proposed QRD-SCHDL method attains comparable results and outperformed in certain cases, thereby signifying the efficiency of the proposed work.

  • Research Article
  • Cite Count Icon 29
  • 10.1371/journal.pone.0092863
PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations
  • Mar 27, 2014
  • PLoS ONE
  • Liqi Li + 7 more

Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  • Book Chapter
  • 10.1007/978-981-10-7455-4_8
Machine Learning Framework: Predicting Protein Structural Features
  • Jan 1, 2018
  • Pramod Kumar + 2 more

Structural biology is a challenging scientific discipline that aims to uncover the topologies and shapes of biomolecules and macromolecules—that is, DNA, RNA, and proteins. Proteins are large macromolecules consisting of more than one chain of amino acids joined together in a linear chain by peptide bonds. Proteins are required in organisms; they help in all biological processes of cells. They catalyze biochemical reactions (enzymes), carry out key roles in cellular processes, and act as structural constituents, catalysis agents, signaling molecules, and molecular machines of every biological system. They are responsible for immune responses, can store molecules (e.g., casein and ovalbumin store amino acids), and are even responsible for cell mechanics (e.g., actin and myosin). The structure prediction of proteins is a difficult task with basic problems in computational biology, structural science, and structural biology. The complex structure of protein prediction has four different levels: (1) one-dimensional (1D) prediction of different structural features and linear chain of amino acids; (2) two-dimensional (2D) prediction of spatial arrangements between amino acids; (3) three-dimensional (3D) (tertiary) structural features prediction of a protein; and (4) four-dimensional (4D) (quaternary) structure prediction of multicomplex proteins. Researchers have recently used most of the various data mining methods, different scripting-based tools, and machine learning tools for structure prediction of a protein. In this chapter, we provide a comprehensive overview of proteins structure and use different data mining machine learning algorithms for protein structure prediction.

  • Research Article
  • Cite Count Icon 1
  • 10.1007/s00521-022-07868-0
Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction
  • Oct 7, 2022
  • Neural Computing & Applications
  • Varanavasi Nallasamy + 1 more

In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.

  • Research Article
  • Cite Count Icon 1
  • 10.1155/2022/1650693
Comparative Study on Feature Selection in Protein Structure and Function Prediction.
  • Oct 11, 2022
  • Computational and mathematical methods in medicine
  • Wenjing Yi + 5 more

Many effective methods extract and fuse different protein features to study the relationship between protein sequence, structure, and function, but different methods have preferences in solving the research of protein structure and function, which requires selecting valuable and contributing features to design more effective prediction methods. This work mainly focused on the feature selection methods in the study of protein structure and function, and systematically compared and analyzed the efficiency of different feature selection methods in the prediction of protein structures, protein disorders, protein molecular chaperones, and protein solubility. The results show that the feature selection method based on nonlinear SVM performs best in protein structure prediction, protein solubility prediction, protein molecular chaperone prediction, and protein solubility prediction. After selection, the accuracy of features is improved by 13.16% ~71%, especially the Kmer features and PSSM features of proteins.

  • Research Article
  • 10.9734/ajb2t/2025/v11i3247
Optimization in Protein Structure and Function Prediction of Silk of Tasar Silkworm, Antheraea mylitta Drury
  • Aug 2, 2025
  • Asian Journal of Biotechnology and Bioresource Technology
  • Ananda Rukmini + 2 more

Protein structure and function prediction is an arduous task in computational biology. Understanding the structure of protein facilitates in understanding its function. The Tasar silkworm, Antheraea mylitta Drury, is a vital species in sericulture, producing high-quality silk fibres with unique properties. Beyond its economic significance, the Tasar silkworm is a model organism for studying insect biology, development, and evolution. Despite recent advances in genomics and proteomics, which have enabled the exploration of the Tasar silkworm proteome, there still exist hitches in the atomic accuracy of its protein structure prediction in the era of deep learning. Here, we endeavour to bridge the existing knowledge gap of algorithmic modelling techniques, especially deep neural networks in protein prediction methods of this valuable lepidopteran, leading to a better understanding of its biological processes and potential applications.

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/icccnt.2013.6726753
Bioinformatics: Protein structure prediction
  • Jul 1, 2013
  • Chandrayani N Rokde + 1 more

Proteins are essential parts of our life and participate in virtually every process within a cell. The understanding of protein structures is vital to determine the function of a protein. Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. Thus, protein structure prediction is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. For minimizing the time, computational methods are used for protein folding and structure prediction problem. In this paper results of protein p53 are discussed.

  • Research Article
  • Cite Count Icon 86
  • 10.1007/s10930-021-10003-y
Protein Structure Prediction: Conventional and Deep Learning Perspectives.
  • May 28, 2021
  • The Protein Journal
  • V A Jisna + 1 more

Protein structure prediction is a way to bridge the sequence-structure gap, one of the main challenges in computational biology and chemistry. Predicting any protein's accurate structure is of paramount importance for the scientific community, as these structures govern their function. Moreover, this is one of the complicated optimization problems that computational biologists have ever faced. Experimental protein structure determination methods include X-ray crystallography, Nuclear Magnetic Resonance Spectroscopy and Electron Microscopy. All of these are tedious and time-consuming procedures that require expertise. To make the process less cumbersome, scientists use predictive tools as part of computational methods, using data consolidated in the protein repositories. In recent years, machine learning approaches have raised the interest of the structure prediction community. Most of the machine learning approaches for protein structure prediction are centred on co-evolution based methods. The accuracy of these approaches depends on the number of homologous protein sequences available in the databases. The prediction problem becomes challenging for many proteins, especially those without enough sequence homologs. Deep learning methods allow for the extraction of intricate features from protein sequence data without making any intuitions. Accurately predicted protein structures are employed for drug discovery, antibody designs, understanding protein-protein interactions, and interactions with other molecules. This article provides a review of conventional and deep learning approaches in protein structure prediction. We conclude this review by outlining a few publicly available datasets and deep learning architectures currently employed for protein structure prediction tasks.

  • Research Article
  • Cite Count Icon 6
  • 10.1186/1472-6807-13-s1-s10
An aggregate analysis of many predicted structures to reduce errors in protein structure comparison caused by conformational flexibility
  • Nov 1, 2013
  • BMC Structural Biology
  • Brian G Godshall + 3 more

BackgroundConformational flexibility creates errors in the comparison of protein structures. Even small changes in backbone or sidechain conformation can radically alter the shape of ligand binding cavities. These changes can cause structure comparison programs to overlook functionally related proteins with remote evolutionary similarities, and cause others to incorrectly conclude that closely related proteins have different binding preferences, when their specificities are actually similar. Towards the latter effort, this paper applies protein structure prediction algorithms to enhance the classification of homologous proteins according to their binding preferences, despite radical conformational differences.MethodsSpecifically, structure prediction algorithms can be used to "remodel" existing structures against the same template. This process can return proteins in very different conformations to similar, objectively comparable states. Operating on close homologs exploits the accuracy of structure predictions on closely related proteins, but structure prediction is often a nondeterministic process. Identical inputs can generate subtly different models with very different binding cavities that make structure comparison difficult. We present a first method to mitigate such errors, called "medial remodeling", that examines a large number of predicted structures to eliminate extreme models of the same binding cavity.ResultsOur results, on the enolase and tyrosine kinase superfamilies, demonstrate that remodeling can enable proteins in very different conformations to be returned to states that can be objectively compared. Structures that would have been erroneously classified as having different binding preferences were often correctly classified after remodeling, while structures that would have been correctly classified as having different binding preferences almost always remained distinct. The enolase superfamily, which exhibited less sequential diversity than the tyrosine kinase superfamily, was classified more accurately after remodeling than the tyrosine kinases. Medial remodeling reduced errors from models with unusual perturbations that distort the shape of the binding site, enhancing classification accuracy.ConclusionsThis paper demonstrates that protein structure prediction can compensate for conformational variety in the comparison of protein-ligand binding sites. While protein structure prediction introduces new uncertainties into the structure comparison problem, our results indicate that unusual models can be ignored through an analysis of many models, using techniques like medial remodeling. These results point to applications of protein structure comparison that extend beyond existing crystal structures.

  • Research Article
  • Cite Count Icon 3
  • 10.5897/ajb08.009
Is protein structure prediction still an enigma
  • Dec 29, 2008
  • AFRICAN JOURNAL OF BIOTECHNOLOGY
  • K Sobha + 2 more

Proteins are large molecules indispensable for the existence and proper functioning of biological organisms. They perform a wide array of functions including catalysis, structure formation, transport, body defense, etc. Understanding the functions of proteins is a fundamental problem in the discovery of drugs to treat various diseases. The structure of a protein can be determined by physical methods which are slow and expensive but owing to the dramatic increase in the numbers of proteins sent to the public data bank during the last few years, it is highly desirable to develop some rapid and effective computational methods to predict the structure of new proteins so as to expedite the process of deducing their function. All the structure prediction methods basically rely on the idea that there is a correlation between residue sequence and structure. The primary structure is unique for each protein and it is generally accepted that a protein’s primary structure is enough to determine its folding process to secondary, tertiary and quaternary structure. Despite recent efforts to develop automated protein structure determination protocols, structural genomic projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. Protein structure determination and prediction has been a focal research subject in life sciences due to the importance of protein structure in understanding the biological and chemical activities of organisms/cell. This review comprehends the various recent advanced methods for protein structure predictions such as a two-stage method for assigning residues one of the three secondary structure states, prediction of homo-oligomeric proteins based on nearest neighbour algorithm, sequence–based hidden markov model, practical ab initio methods aimed at finding the native structure of the protein by simulating the biological process of protein folding, and metapredictors based on consensus form multiple methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 40
  • 10.1186/1472-6807-6-18
Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison
  • Jan 1, 2006
  • BMC Structural Biology
  • Luonan Chen + 4 more

BackgroundProtein structure comparison is one of the most important problems in computational biology and plays a key role in protein structure prediction, fold family classification, motif finding, phylogenetic tree reconstruction and protein docking.ResultsWe propose a novel method to compare the protein structures in an accurate and efficient manner. Such a method can be used to not only reveal divergent evolution, but also identify circular permutations and further detect active-sites. Specifically, we define the structure alignment as a multi-objective optimization problem, i.e., maximizing the number of aligned atoms and minimizing their root mean square distance. By controlling a single distance-related parameter, theoretically we can obtain a variety of optimal alignments corresponding to different optimal matching patterns, i.e., from a large matching portion to a small matching portion. The number of variables in our algorithm increases with the number of atoms of protein pairs in almost a linear manner. In addition to solid theoretical background, numerical experiments demonstrated significant improvement of our approach over the existing methods in terms of quality and efficiency. In particular, we show that divergent evolution, circular permutations and active-sites (or structural motifs) can be identified by our method. The software SAMO is available upon request from the authors, or from and .ConclusionA novel formulation is proposed to accurately align protein structures in the framework of multi-objective optimization, based on a sequence order-independent strategy. A fast and accurate algorithm based on the bipartite matching algorithm is developed by exploiting the special features. Convergence of computation is shown in experiments and is also theoretically proven.

  • Research Article
  • Cite Count Icon 2
  • 10.2174/18750362-v16-e230711-2023-2
Comparative Functional Genomics Studies for Understanding the Hypothetical Proteins in Mycobacterium Tuberculosis Variant Microti 12
  • Jul 26, 2023
  • The Open Bioinformatics Journal
  • Tejaswini Vijay Shinde + 7 more

Background: The Mycobacterium tuberculosis complex (MTBC) bacteria include the slowly growing, host-associated bacteria Mycobacterium tuberculosis, Mycobacterium Bovis, Mycobacterium microti, Mycobacterium africanum, Mycobacterium pinnipedii. Aim: Comparative Functional Genomics Studies for understanding the Hypothetical Proteins in Mycobacterium tuberculosis variant microti 12. Objective: A computational genomics study was performed to understand the 247 hypothetical protein genes. Functional annotation of virtual proteins was performed on different servers to maximize confidence level. Methods: Sequence Retrieval. The whole genome sequences for the Mycobacterium tuberculosis micro variant 12 were retrieved from the KEGG database ( http://www.genome.jp/kegg/) and were used for screening 247 hypothetical proteins (Fig. 1 ). Functional Annotation and Sub-cellular localization. The Mycobacterium tuberculosis micro variant 12 hypothetical proteins were screened and sorted out from the genome and were individually analyzed for the presence of conserved functional domains by using computational biology tools like CDD-BLAST ( https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) ;Pfam ( http://pfam.xfam.org/ncbiseq/398365647); The subcellular localization of hypothetical proteins was determined by CELLO2GO ( http://cello.life.nctu.edu.tw). These web tools can search the defined conserved domains in the sequences available in the online servers or databases and assist in the classification of proteins in the appropriate families. Protein Structure Prediction. The in-silico structure predictions of the hypothetical protein sequences showing functional properties were carried out by using the PS2 Protein Structure Prediction Server ( http://www.ps2.life.nctu.edu.tw/). The online server helps to generate the 3D structures of the hypothetical proteins. The server accepts the sequences in FASTA format as a query to generate resultant proteins 3D structures. The structure determination is completely based on the conserved template regions detected during functional annotations. Protein-protein interaction through String database: The interaction of each hypothetical protein analyzed for functional characteristics was subjected to a protein-protein interaction server for the prediction of a possible functional role in interaction amongst the available known proteins ( https://string-db.org/). This information can help us to further validated the functional role of such hypothetical proteins and their possible role in the Mycobacterium Tuberculosis micro variant. Protein secondary structure prediction through JPred4: The secondary structure prediction of all the hypothetical proteins was determined through JPred4 ( http://www.compbio.dundee.ac.uk/jpred4/index.html) and served to identify the available secondary structures in the unknown hypothetical protein sequences. These further help us to understand the available templates in the uncharacterized protein sequences for the prediction of novel functions associated with these proteins. The predictions were further characterized by the Phyre2 server for structural modeling and prediction of templates based on comparative analysis based on conserved domains. Protein modeling, prediction, and analysis through Phyre2. The hypothetical proteins which were identified to have functional properties were further characterized by the Phyre2 server ( http://www.sbg.bio.ic.ac.uk/phyre2) for structural modeling and prediction of templates based on comparative analysis based on conserved domains. Results: A computational genomics study was performed to understand the 247 hypothetical protein genes Functional annotation of virtual proteins, and was performed on different servers to maximize confidence level. The functional prediction was performed by CDD-Blast and Pfam. The gene sequences of proteins have probably been successfully functionally annotated, characterized, and their subcellular localization and 3-D structural predictions have been predicted computationally. Online automated bioinformatics tools such as CDD-Blast, Pfam, CELLO2GO and PS2-Server were used for the structural and functional characterization of screened hypothetical proteins. The structure, function, and subcellular localization of a hypothetical protein from Mycobacterium tuberculosis variant microti 12 have been obtained and presented (Fig. 2 ). Also, the three-dimensional structure generated after using the template with the highest score was displayed as the template ID in the structure column of the respective hypothetical protein. However, as systems biology denies hypothetical protein functions, the structures of such proteins can be tested through biological processes and experiments, making them suitable for understanding their role in the life cycle, pathogenesis, and drug development. We can further explore these predictive possibilities in pharmaceuticals, and other clinically relevant studies. This study by HP helped find structure-function relationships in Mycobacterium tuberculosis variant microti 12 using a variety of bioinformatics tools. The string database made predictions about protein-protein interactions and the template helped us predict a hypothetical protein structure and even helped us find its 3D protein structure. Protein profiling can be performed on structures retrieved from these servers. This is useful for proteomics studies, including protein-protein interactions, protein expression of specific hypothetical proteins, and post-translational modifications of protein-coding genes. Further understanding of these hypothetical proteins can help us to know more about the Mycobacterium tuberculosis complex (MTBC) and may assist in Drugs and inhibitors against different pathogens within this complex. Conclusion: The all-inclusive bioinformatic study has helped to functionally elucidate 247 hypothetical proteins, which have resulted and made it easier to understand many functional proteins available in the Mycobacterium tuberculosis micro variant 12. The subcellular localization of the 247 sorted hypothetical proteins was also carried & which further helped us understand the localization of identified enzymes or proteins. We have successfully characterized the 247 unknown proteins of hypothetical protein sequences from Mycobacterium tuberculosis micro variant 12 to validate their structure and functions of the gene products. These predicted functions and three-dimensional structures may lead to establishing their role in the life cycle of the bacterium. This computationally generated data can also be further used for developing new protocols for new vaccines against Mycobacterium tuberculosis micro variant 12 that are essential for preventing infection, diseases, and transmission. This complete result of Hypothetical Protein is needed for further studies of the whole genomic of the Mycobacterium Tuberculosis micro variant 12 for their function interpretation which further help in the understanding of its functions as well as structure. Moreover, this interpretation would help us to study the evolution of Mycobacterium Tuberculosis micro variant 12 which further helps in the process of discovering the drugs to inhibit the causes of diseases.

  • Research Article
  • Cite Count Icon 345
  • 10.1016/j.str.2011.09.022
Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling
  • Dec 1, 2011
  • Structure
  • Jian Zhang + 2 more

Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling

  • Research Article
  • Cite Count Icon 22
  • 10.11234/gi1990.15.2_181
Predicting protein secondary structure by a support vector machine based on a new coding scheme.
  • Jul 11, 2011
  • Genome Informatics
  • Juan Liu + 3 more

Protein structure prediction is one of the most important problems in modern computational biology. Protein secondary structure prediction is a key step in prediction of protein tertiary structure. There have emerged many methods based on machine learning techniques, such as neural networks (NN) and support vector machine (SVM) etc., to focus on the prediction of the secondary structures. In this paper, a new method was proposed based on SVM. Different from the existing methods, this method takes into account of the physical-chemical properties and structure properties of amino acids. When tested on the most popular dataset CB513, it achieved a Q(3) accuracy of 0.7844, which illustrates that it is one of the top range methods for protein of secondary structure prediction.

  • Research Article
  • Cite Count Icon 6
  • 10.1128/jb.185.14.3990-3993.2003
Pretty good guessing: protein structure prediction at CASP5.
  • Jul 15, 2003
  • Journal of Bacteriology
  • Rosemarie Swanson + 1 more

In this special issue of the Journal of Bacteriology, bacteriologists look into the smallest organisms even deeper than before, down to the molecular level. The focus is on experimentally determined molecular structures. However, structure prediction from amino acid sequence data is becoming a usable source of protein structure information as well. Interest in protein structure prediction is old, but success is new. About 9 years ago, John Moult and others organized the first effort known as Critical Assessment of Protein Structure Prediction (CASP). They arranged with experimentalists to provide amino acid sequence information for soon-to-be-determined protein structures and invited the protein prediction community to try their methods on these target unknowns. Predictors submitted their results to the organizers for evaluation against the true structures when they became available. The format of using a community-wide experiment and a meeting to present the evaluations to the predictors propelled the improvement of methods. Last December the fifth evaluation meeting of the biennial CASP effort (CASP5) was held at Asilomar Conference Grounds in Pacific Grove, Calif. (7). The success of the best of the predictors in the last two CASP evaluations (7, 8) warrants mention of the methods and results here. Methods for prediction are different for easy and hard cases. The choice of method depends on the degree of similarity between the amino acid sequence of the unknown and the sequences of known structures. THE HARDEST TEST Even though they have the worst agreement with the experimental results, the most exciting predictions are the successes in the “new fold” category, where the sequence of the unknown has no significant similarity to the sequence of any known structure. Five of the eighty-odd domains available for prediction fell into this category in CASP5. In this most difficult category, the evaluator considered that at least one “excellent” prediction was made for each target. Of 165 predictors who attempted these difficult targets, nine had a prediction among the best ten (out of hundreds) for three or more of the targets. So some techniques consistently perform better than the rest. In the new fold category, a respectable result means that the predicted chain has the same kinds of pieces in the same relative orientations, not that the pieces superimpose on each other. The degree of agreement might be similar to that between photographs of the same person at age 20 and at age 80. In fact, predicting a new fold is like drawing a face that the artist has never seen. And in fact, structure prediction methods are like the methods used by police artists, in an important sense. A witness is shown a gallery of faces and asked to pick out parts from them that individually resemble parts of the suspect’s face. The police artist then combines the parts into a whole that resembles the witness’s memory of the face. The most successful methods of structure prediction for new folds similarly rely on the assembly of a unique whole from fragments selected from a gallery of protein structures. ONE OF THE GOOD METHODS In a coarse description of the most successful method of new fold prediction, the first step is to obtain secondary structure (helix, beta strand, etc.) predictions for the unknown and to divide the sequence of the unknown into short fragments (nine amino acids). Then known structures (the equivalent of the gallery of faces) are searched for fragments that are similar in secondary structure and/or sequence profile to the unknown’s fragments. A library of these fragments from known structures is constructed (the equivalent of the collection of witnessselected individual features). The starting guess for the unknown structure is a completely extended chain (equivalent to the blank paper), but randomly selected suitable fragments repeatedly replace sections of the extended chain. After each fragment placement (“move”), the chain is checked for collisions and other bad and good features, and the move is rejected or accepted. After a large number (thousands) of fragment placements, a folded chain has been created (the equivalent of a single face). In contrast to the limited number of faces an artist could produce, however, tens of thousands of candidate structures are produced. The candidate structures are clustered according to their structural similarity to each other, and the centers of the few largest clusters are selected as the best candidate structures. Final adjustments to the candidates are made to make the models more physically realistic. The method’s increasing power lies in the improving selection of the contents of the fragment library and in the improving rules for accepting or rejecting a fragment placement. (For further detail and other methods, see reference 7.) In CASP5, the method just described was used effectively not only for new folds but also for loop regions in unknowns where a structure for a related sequence was available. The loops were modeled by the new fold method, but otherwise the prediction was closely guided by the template (“comparative modeling”). Why use a template?

Save Icon
Up Arrow
Open/Close