Machine Learning Framework: Predicting Protein Structural Features
Structural biology is a challenging scientific discipline that aims to uncover the topologies and shapes of biomolecules and macromolecules—that is, DNA, RNA, and proteins. Proteins are large macromolecules consisting of more than one chain of amino acids joined together in a linear chain by peptide bonds. Proteins are required in organisms; they help in all biological processes of cells. They catalyze biochemical reactions (enzymes), carry out key roles in cellular processes, and act as structural constituents, catalysis agents, signaling molecules, and molecular machines of every biological system. They are responsible for immune responses, can store molecules (e.g., casein and ovalbumin store amino acids), and are even responsible for cell mechanics (e.g., actin and myosin). The structure prediction of proteins is a difficult task with basic problems in computational biology, structural science, and structural biology. The complex structure of protein prediction has four different levels: (1) one-dimensional (1D) prediction of different structural features and linear chain of amino acids; (2) two-dimensional (2D) prediction of spatial arrangements between amino acids; (3) three-dimensional (3D) (tertiary) structural features prediction of a protein; and (4) four-dimensional (4D) (quaternary) structure prediction of multicomplex proteins. Researchers have recently used most of the various data mining methods, different scripting-based tools, and machine learning tools for structure prediction of a protein. In this chapter, we provide a comprehensive overview of proteins structure and use different data mining machine learning algorithms for protein structure prediction.
- Research Article
- 10.1142/s021800142250015x
- Mar 14, 2022
- International Journal of Pattern Recognition and Artificial Intelligence
Predicting three-dimensional structure of a protein in the field of computational molecular biology has received greater attention. Most of the recent research works aimed at exploring search space, however with the increasing nature and size of data, protein structure identification and prediction are still in the preliminary stage. This work is aimed at exploring search space to tackle protein structure prediction with minimum execution time and maximum accuracy by means of quantile regressive dragonfly and structural class homolog-based deep learning (QRD-SCHDL). The proposed QRD-SCHDL method consists of two distinct steps. They are protein structure identification and prediction. In the first step, protein structure identification is performed by means of QRD optimization model to identify protein structure with minimum error. Here the protein structure identification is first performed as the raw database contains sequence information and does not contain structural information. An optimization model is designed to obtain the structural information from the database. However, protein structure gives much more insight than its sequence. Therefore, to perform computational prediction of protein structure from its sequence, actual protein structure prediction is made. The second step involves the actual protein structure prediction via structural class and homolog-based deep learning. For each protein structure prediction, a scoring matrix is obtained by utilizing structural class maximum correlation coefficient. Finally, the proposed method is tested on a set of different unique numbers of protein data and compared to the state-of-the-art methods. The obtained results showed the potentiality of the proposed method in terms of metrics, error rate, protein structure prediction time, protein structure prediction accuracy, precision, specificity, recall, ROC, Kappa coefficient and [Formula: see text]-measure, respectively. It also shows that the proposed QRD-SCHDL method attains comparable results and outperformed in certain cases, thereby signifying the efficiency of the proposed work.
- Research Article
- 10.1016/s0968-0004(99)01450-4
- Nov 1, 1999
- Trends in Biochemical Sciences
Biology in silico – a mixed bag : Computational Methods in Molecular Biology (New Comprehensive Biochemistry Vol. 32) edited by S. L. Salzberg, D. B. Searls and S. Kasif
- Research Article
1
- 10.1360/n972016-00658
- Aug 1, 2016
- Chinese Science Bulletin
Protein folding is the process that a protein molecule transforms from the linear polymer of peptides to a three-dimensional native structure with specific biological function. By now, the protein folding problem has been studied for more than 50 years and already became a broad and active research field. To answer the 58th question raised by Science in 2005, in this article we briefly reviewed the background and research history of the protein folding problem, and introduced the progresses of protein folding prediction research from four aspects: the protein folding process prediction (protein folding simulation), the folding process related parameter prediction, the protein folding result prediction (protein structure prediction), and the folding result related parameter prediction. The studies on the protein folding problem began in the 60s of 20th century, with the efforts to seek a solution to the paradox that a protein can actually form a native 3D structure in only several seconds but the time scale estimated by a thermodynamic ergodic hypothesis would be longer than the age of universe. Computer simulation is an important approach for protein folding study. The protein models can be classified into 3 categories: lattice model, off-lattice model and all-atom model. The current knowledge about protein folding mechanism is based on the concept of folding funnel on a free-energy landscape, and the current opinion is that the protein folding mechanism is not unique for the whole protein universe and that there may exist a continuum between the two extreme ends of hierarchical folding and nucleation folding scenarios. The hardware for protein folding simulation was becoming more powerful; distributed systems (e.g, Folding@home), special-purpose machines (e.g, ANTON), and GPU-based platforms have been developed for protein folding simulation. Meanwhile, the folding simulation software was continuously enhanced. An important issue in protein folding simulation is to overcome the local energy barrier to find the global energy minimum; several approaches such as replica-exchange, multi-scale modeling and Modeling Employing Limited Data (MELD) were developed to tackle this issue; human intelligence involvement (e.g, “Foldit” Game) is another interesting effort. During the past two decades, the ability of protein folding simulation was continuously rising. For now, the folding simulation for the proteins with dozens of amino acids can reach a time scale of millisecond, while the protein size able to do effective folding simulation is around 100 amino acids. The targets of protein folding simulation have been largely expanded and now include both the in vitro and the in vivo folding such as co-translational folding, chaperone-assistant folding, small-molecule- induced folding and metal-coupled folding. Folding rate and folding type are two important parameters related with the protein folding process and now they can be predicted by statistical and machine-learning approaches based on different levels of structural features such as the topological properties of tertiary structure, the contents of secondary structure and the amino acid frequencies of primary structure. The result of a protein folding process is the formation of a protein structure. According to the hierarchy of structural organization, the protein structure prediction problem includes secondary structure prediction, tertiary structure prediction and quaternary structure prediction. By now, the secondary structure prediction algorithm has experienced five generations and the current accuracy is about 80% for 3-classes prediction. The tertiary structure prediction approaches mainly include two categories: template-based modeling and free modeling, with the former having higher accuracy and the latter having larger application scope. The quaternary structure prediction includes the prediction of complex structure and the prediction of the possibility of protein-protein interaction, and these predictions can be performed based on protein 3D structure or merely amino acid sequence. Structure related parameter prediction also attracted research interests, including the predictions of protein structural classes, secondary structure contents, disordered regions, solvent accessible surface region and the amino acid contacting pairs in the interface of protein-protein interaction. In the end, some possible development directions worth noticing in the future of protein folding research were suggested and they are: the coupling between protein folding and binding, the fusion of protein folding research with systems biology and the application of deep-learning techniques in the field of protein folding prediction.
- Research Article
1
- 10.1007/s00521-022-07868-0
- Oct 7, 2022
- Neural Computing & Applications
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
- Book Chapter
4
- 10.1016/s1574-1400(08)00003-0
- Jan 1, 2008
- Annual Reports in Computational Chemistry
Chapter 3 - Machine Learning for Protein Structure and Function Prediction
- Research Article
29
- 10.1371/journal.pone.0092863
- Mar 27, 2014
- PLoS ONE
Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
- Dissertation
- 10.31274/etd-180810-863
- Apr 28, 2012
There is a critical need for protein structure and function prediction. Accurate protein secondary structure prediction is essential for many bioinformatics applications, including protein tertiary structure prediction. We developed an algorithm (Fragment Data Mining, FDM) for protein secondary structure prediction using fragments of known structures obtained by multiple sequence alignment (MSA). Its performance is excellent where highscore MSA matches are available. By combing it with GOR V, a new Consensus Database Mining (CDM) method was developed, which surpasses the performances of both FDM and GOR V. For each residue, it chooses to use either the result of FDM or GOR V depending upon the availability of high-score matches of MSA. A server has been set up to make CDM publicly accessible. It becomes more popular due to the reliability and efficiency of its performance, the simplicity of its use, and its potential for improvement with the rapidly growing number of determined structrues. Phosphorylation is the most important post translational modifications for cellular regulation and signal transduction. Upon phosphorylation, proteins can undergo obvious conformational changes. It is challenging to characterize these changes because of the high flexibility of phosphorylation regions and the difficulties in obtaining diffraction quality crystals. In the current study, we focused on the conformational changes of CDK2 due to phosphorylation at Thr160. We use C-C-side chain (CABS) modeling, Targeted Molecular Dynamics (TMD) and conventional molecular dynamics (MD) to simulate the structural transition and create transition pathways. Principal component analysis (PCA) of the trajectories and normal mode analysis (NMA) with anisotropic network model (ANM – an
- Research Article
- 10.17762/ijritcc.v11i9.9194
- Nov 5, 2023
- International Journal on Recent and Innovation Trends in Computing and Communication
The Protein molecule is known as the large biological molecule in a living organism. The protein performs several works like transporting molecules, catalysing metabolic reaction, responding to stimuli etc in a human body. Protein Structure analysis and prediction is very much essential to make any research about the same protein molecule. The basic intention of protein structure prediction (PSP) is to predict the three dimensional structure that generate by the amino acid sequence. The very peculiar matter is only twenty amino acid found in a living body where as approximately one lakh protein molecules can be framed from the same amino acid compositions in different percentages. The three dimensional structure framed by the amino acid compositions generally changes its shape and size due to the effect of external agents or medicines that comes in contact with these protein molecules. The basic intention behind the prediction of structure of the protein is to design new drugs or medicines. From the structures the medicine researchers working for the development of medicines may easily detect the changes in the living body or the requirement of drugs or medicines. The detection of the structure and the prediction of perfect structure is always a challenging task. The protein structure is basically a three dimensional structure in its secondary transformation. The structure may be in the form of ? Helix, ? sheets or loop etc. In this paper the identification of the secondary structures and the percentages of ? Helix, ? sheets or loop structures are being predicted and the probable complexities that may occur during the prediction is discussed. Deep neural network is a deep structured learning process is an application of the broader family machine learning. Deep learning architectures has a number application in various fields like medical science, bioinformatics, medical image analysis etc. A novel method is being proposed in this research article for the detection, correction and removal of various complexities during prediction using deep neural network. This technique will be helpful for different researchers working in the field for drug design and medicine research.
- Research Article
14
- 10.3390/cryst11040324
- Mar 24, 2021
- Crystals
In the postgenomic age, rapid growth in the number of sequence-known proteins has been accompanied by much slower growth in the number of structure-known proteins (as a result of experimental limitations), and a widening gap between the two is evident. Because protein function is linked to protein structure, successful prediction of protein structure is of significant importance in protein function identification. Foreknowledge of protein structural class can help improve protein structure prediction with significant medical and pharmaceutical implications. Thus, a fast, suitable, reliable, and reasonable computational method for protein structural class prediction has become pivotal in bioinformatics. Here, we review recent efforts in protein structural class prediction from protein sequence, with particular attention paid to new feature descriptors, which extract information from protein sequence, and the use of machine learning algorithms in both feature selection and the construction of new classification models. These new feature descriptors include amino acid composition, sequence order, physicochemical properties, multiprofile Bayes, and secondary structure-based features. Machine learning methods, such as artificial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), random forest, deep learning, and examples of their application are discussed in detail. We also present our view on possible future directions, challenges, and opportunities for the applications of machine learning algorithms for prediction of protein structural classes.
- Research Article
1
- 10.1038/npre.2011.6693.1
- Dec 13, 2011
- Nature Precedings
Enormous computational efforts have been carried out to predict structure and function of protein. However, nearly all of these efforts have been focused on prediction of function based on primary nucleic acid sequence or modeling 3D structure of protein from its nucleic acid sequence. In fact, it seems that amino acid attributes, which is an intermediate phase between DNA/RNA and advanced protein structure, has been missed.From 2010, we examined the possibility of precise prediction of structural protein function based on amino acid features by improving the following three aspects of amino acid research: (1) Increasing the number of computationally calculated amino acid features, (2) Testing different feature selection (attribute weighting) algorithms and selection of the most important amino acid attributes based on the overall conclusion of algorithms, (3) Examining different supervised and unsupervised data mining (machine learning) algorithms, and (4) Joining attribute weighting with different data mining algorithms. We applied the discovered procedure in different biological examples including: protein thermostability, halostability, prediction of function of heavy metal transporters, cancer diagnosis and prediction, and pursuing the EST-SSRs in amino acid level.In thermostability study, we successfully established an accurate expert system to predict the thermostability of any input sequence trough mining of its calculated amino acid features. Interestingly, performance of a clustering algorithm such as EMC can vary from 0.0% to 100%, depending upon which attribute weighting algorithm had summarized the attributes of the dataset prior to running the clustering algorithm.In another recent study on halostability, the results showed that amino acid composition can be used to efficiently discriminate halostable protein groups with up to 98% accuracy implying the possibility of precise prediction of halostability when an appropriate machine learning algorithm mines a large number of structural amino acid attributes of primary protein structure.Using our approach, simple amino acid features, without the need of advanced features of protein structure, could explain the difference between P1B-ATPases in hyperaccumulator and nonhyperaccumulator plants. More importantly, a precise model was built to discriminate P1B-ATPases in different organisms based on their structural amino acid features. In addition, for the first time, reliable models for prediction of the hyperaccumulating activity of unknown P1B-ATPase pumps were developed.We employed our method in monitoring and prediction of breast cancer. The results confirmed that amino acid composition can be used to discriminate between protein groups expressed in two forms of breast cancer: malignant and benign. This study was strong evidence that malignancy can be predicted out from amino acid, and malignant proteins can be distinguished based on the amino acid composition of their proteomes without further need for protein separation. An important outcome was the discovery of the role of dipeptides, in particular Ile-Ile, in cancer progression. In addition, Generalized Rule Induction (GRI) found association rules in the data showing the 100 most important rules classifying benign, malignant, and commonly expressed proteins expressed in breast cancers.In another investigation, we found that EST-SSRs in normal lung tissues are different than in unhealthy tissues, and tagged ESTs with SSRs cause remarkable differences in amino acid and protein expression patterns in cancerous tissue. This can be supposed as a glimpse of invention of a new sort of biomarkers based on frequency of amino acids.Up to now, phylogenic trees, drawn by nucleic acid or amino acid sequence alignments, have been employed as the base of evolutionary studies. However, this method does not take into account the structural and functional features of sequences during evolution. On the contrary, the presented classification here, based on the decision tree, anomaly detection model and feature weighting, provides an evolutionary separation of organisms based on their structural reasons of this diversity.Our findings have the potential to be efficiently used in the following area: filling the gap between laboratory engineering of proteins and computational biology, developing amino acid feature based-biomarkers, increasing the accuracy of prediction of 3D protein structure based on important amino acid features, and developing websites/software for prediction of the results of mutation. In addition, important discovered amino acid features can be employed as clues for discovering important DNA mutations and increasing prediction accuracy of 3D structure from DNA sequence. Furthermore, this study offers new for protein function, irrespective of similarity searches.
- Research Article
- 10.9734/ajb2t/2025/v11i3247
- Aug 2, 2025
- Asian Journal of Biotechnology and Bioresource Technology
Protein structure and function prediction is an arduous task in computational biology. Understanding the structure of protein facilitates in understanding its function. The Tasar silkworm, Antheraea mylitta Drury, is a vital species in sericulture, producing high-quality silk fibres with unique properties. Beyond its economic significance, the Tasar silkworm is a model organism for studying insect biology, development, and evolution. Despite recent advances in genomics and proteomics, which have enabled the exploration of the Tasar silkworm proteome, there still exist hitches in the atomic accuracy of its protein structure prediction in the era of deep learning. Here, we endeavour to bridge the existing knowledge gap of algorithmic modelling techniques, especially deep neural networks in protein prediction methods of this valuable lepidopteran, leading to a better understanding of its biological processes and potential applications.
- Research Article
20
- 10.1089/cmb.2019.0193
- Aug 11, 2019
- Journal of Computational Biology
The folding of a protein structure is a process governed by both local and nonlocal interactions. While incorporating local dependencies into a machine learning algorithm for protein structure prediction is simple and has been exploited for some time, the modeling of long-range dependences which result from structurally-neighboring residues has only recently begun to be addressed. Structural properties designed to localize the prediction space from direct tertiary structure prediction, such as secondary structure, contact maps, and intrinsic disorder, among others, have begun to greatly benefit from machine learning models capable of modeling a widened, potentially global protein context. This has led to a direct enhancement of the quality of predicted tertiary structures through both the optimization of structural constraints and improved reliability of alignments to structural templates. These improvements have stemmed from the application of recurrent and convolutional neural network architectures effective not only at innate sequential context propagation but also deep feature extraction due to novel skip connections and normalization techniques allowing for greatly enhanced error back-propagation. The recent results from independent blind testing in Critical Assessment of protein Structure Prediction 13 have signaled the beginning of a new generation of protein structure prediction through the utilization of these contextual techniques. The ripples from advancements in the determination of one-dimensional and two-dimensional structural properties have us moving ever closer to the solution of the protein structure prediction problem.
- Conference Article
10
- 10.1109/icccnt.2013.6726753
- Jul 1, 2013
Proteins are essential parts of our life and participate in virtually every process within a cell. The understanding of protein structures is vital to determine the function of a protein. Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. Thus, protein structure prediction is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. For minimizing the time, computational methods are used for protein folding and structure prediction problem. In this paper results of protein p53 are discussed.
- Research Article
3
- 10.5897/ajb08.009
- Dec 29, 2008
- AFRICAN JOURNAL OF BIOTECHNOLOGY
Proteins are large molecules indispensable for the existence and proper functioning of biological organisms. They perform a wide array of functions including catalysis, structure formation, transport, body defense, etc. Understanding the functions of proteins is a fundamental problem in the discovery of drugs to treat various diseases. The structure of a protein can be determined by physical methods which are slow and expensive but owing to the dramatic increase in the numbers of proteins sent to the public data bank during the last few years, it is highly desirable to develop some rapid and effective computational methods to predict the structure of new proteins so as to expedite the process of deducing their function. All the structure prediction methods basically rely on the idea that there is a correlation between residue sequence and structure. The primary structure is unique for each protein and it is generally accepted that a protein’s primary structure is enough to determine its folding process to secondary, tertiary and quaternary structure. Despite recent efforts to develop automated protein structure determination protocols, structural genomic projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. Protein structure determination and prediction has been a focal research subject in life sciences due to the importance of protein structure in understanding the biological and chemical activities of organisms/cell. This review comprehends the various recent advanced methods for protein structure predictions such as a two-stage method for assigning residues one of the three secondary structure states, prediction of homo-oligomeric proteins based on nearest neighbour algorithm, sequence–based hidden markov model, practical ab initio methods aimed at finding the native structure of the protein by simulating the biological process of protein folding, and metapredictors based on consensus form multiple methods.
- Research Article
6
- 10.1128/jb.185.14.3990-3993.2003
- Jul 15, 2003
- Journal of Bacteriology
In this special issue of the Journal of Bacteriology, bacteriologists look into the smallest organisms even deeper than before, down to the molecular level. The focus is on experimentally determined molecular structures. However, structure prediction from amino acid sequence data is becoming a usable source of protein structure information as well. Interest in protein structure prediction is old, but success is new. About 9 years ago, John Moult and others organized the first effort known as Critical Assessment of Protein Structure Prediction (CASP). They arranged with experimentalists to provide amino acid sequence information for soon-to-be-determined protein structures and invited the protein prediction community to try their methods on these target unknowns. Predictors submitted their results to the organizers for evaluation against the true structures when they became available. The format of using a community-wide experiment and a meeting to present the evaluations to the predictors propelled the improvement of methods. Last December the fifth evaluation meeting of the biennial CASP effort (CASP5) was held at Asilomar Conference Grounds in Pacific Grove, Calif. (7). The success of the best of the predictors in the last two CASP evaluations (7, 8) warrants mention of the methods and results here. Methods for prediction are different for easy and hard cases. The choice of method depends on the degree of similarity between the amino acid sequence of the unknown and the sequences of known structures. THE HARDEST TEST Even though they have the worst agreement with the experimental results, the most exciting predictions are the successes in the “new fold” category, where the sequence of the unknown has no significant similarity to the sequence of any known structure. Five of the eighty-odd domains available for prediction fell into this category in CASP5. In this most difficult category, the evaluator considered that at least one “excellent” prediction was made for each target. Of 165 predictors who attempted these difficult targets, nine had a prediction among the best ten (out of hundreds) for three or more of the targets. So some techniques consistently perform better than the rest. In the new fold category, a respectable result means that the predicted chain has the same kinds of pieces in the same relative orientations, not that the pieces superimpose on each other. The degree of agreement might be similar to that between photographs of the same person at age 20 and at age 80. In fact, predicting a new fold is like drawing a face that the artist has never seen. And in fact, structure prediction methods are like the methods used by police artists, in an important sense. A witness is shown a gallery of faces and asked to pick out parts from them that individually resemble parts of the suspect’s face. The police artist then combines the parts into a whole that resembles the witness’s memory of the face. The most successful methods of structure prediction for new folds similarly rely on the assembly of a unique whole from fragments selected from a gallery of protein structures. ONE OF THE GOOD METHODS In a coarse description of the most successful method of new fold prediction, the first step is to obtain secondary structure (helix, beta strand, etc.) predictions for the unknown and to divide the sequence of the unknown into short fragments (nine amino acids). Then known structures (the equivalent of the gallery of faces) are searched for fragments that are similar in secondary structure and/or sequence profile to the unknown’s fragments. A library of these fragments from known structures is constructed (the equivalent of the collection of witnessselected individual features). The starting guess for the unknown structure is a completely extended chain (equivalent to the blank paper), but randomly selected suitable fragments repeatedly replace sections of the extended chain. After each fragment placement (“move”), the chain is checked for collisions and other bad and good features, and the move is rejected or accepted. After a large number (thousands) of fragment placements, a folded chain has been created (the equivalent of a single face). In contrast to the limited number of faces an artist could produce, however, tens of thousands of candidate structures are produced. The candidate structures are clustered according to their structural similarity to each other, and the centers of the few largest clusters are selected as the best candidate structures. Final adjustments to the candidates are made to make the models more physically realistic. The method’s increasing power lies in the improving selection of the contents of the fragment library and in the improving rules for accepting or rejecting a fragment placement. (For further detail and other methods, see reference 7.) In CASP5, the method just described was used effectively not only for new folds but also for loop regions in unknowns where a structure for a related sequence was available. The loops were modeled by the new fold method, but otherwise the prediction was closely guided by the template (“comparative modeling”). Why use a template?