Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.

Similar Papers
  • Research Article
  • 10.1142/s021800142250015x
Protein Structure Prediction Using Quantile Dragonfly and Structural Class-Based Deep Learning
  • Mar 14, 2022
  • International Journal of Pattern Recognition and Artificial Intelligence
  • Varanavasi Nallasamy + 1 more

Predicting three-dimensional structure of a protein in the field of computational molecular biology has received greater attention. Most of the recent research works aimed at exploring search space, however with the increasing nature and size of data, protein structure identification and prediction are still in the preliminary stage. This work is aimed at exploring search space to tackle protein structure prediction with minimum execution time and maximum accuracy by means of quantile regressive dragonfly and structural class homolog-based deep learning (QRD-SCHDL). The proposed QRD-SCHDL method consists of two distinct steps. They are protein structure identification and prediction. In the first step, protein structure identification is performed by means of QRD optimization model to identify protein structure with minimum error. Here the protein structure identification is first performed as the raw database contains sequence information and does not contain structural information. An optimization model is designed to obtain the structural information from the database. However, protein structure gives much more insight than its sequence. Therefore, to perform computational prediction of protein structure from its sequence, actual protein structure prediction is made. The second step involves the actual protein structure prediction via structural class and homolog-based deep learning. For each protein structure prediction, a scoring matrix is obtained by utilizing structural class maximum correlation coefficient. Finally, the proposed method is tested on a set of different unique numbers of protein data and compared to the state-of-the-art methods. The obtained results showed the potentiality of the proposed method in terms of metrics, error rate, protein structure prediction time, protein structure prediction accuracy, precision, specificity, recall, ROC, Kappa coefficient and [Formula: see text]-measure, respectively. It also shows that the proposed QRD-SCHDL method attains comparable results and outperformed in certain cases, thereby signifying the efficiency of the proposed work.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 29
  • 10.1371/journal.pone.0092863
PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations
  • Mar 27, 2014
  • PLoS ONE
  • Liqi Li + 7 more

Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  • Research Article
  • Cite Count Icon 37
  • 10.1042/bst20130055
Centenary Award and Sir Frederick Gowland Hopkins Memorial Lecture. Protein folding, structure prediction and design.
  • Mar 20, 2014
  • Biochemical Society transactions
  • David Baker

I describe how experimental studies of protein folding have led to advances in protein structure prediction and protein design. I describe the finding that protein sequences are not optimized for rapid folding, the contact order-protein folding rate correlation, the incorporation of experimental insights into protein folding into the Rosetta protein structure production methodology and the use of this methodology to determine structures from sparse experimental data. I then describe the inverse problem (protein design) and give an overview of recent work on designing proteins with new structures and functions. I also describe the contributions of the general public to these efforts through the Rosetta@home distributed computing project and the FoldIt interactive protein folding and design game.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 14
  • 10.3390/cryst11040324
Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms
  • Mar 24, 2021
  • Crystals
  • Lin Zhu + 2 more

In the postgenomic age, rapid growth in the number of sequence-known proteins has been accompanied by much slower growth in the number of structure-known proteins (as a result of experimental limitations), and a widening gap between the two is evident. Because protein function is linked to protein structure, successful prediction of protein structure is of significant importance in protein function identification. Foreknowledge of protein structural class can help improve protein structure prediction with significant medical and pharmaceutical implications. Thus, a fast, suitable, reliable, and reasonable computational method for protein structural class prediction has become pivotal in bioinformatics. Here, we review recent efforts in protein structural class prediction from protein sequence, with particular attention paid to new feature descriptors, which extract information from protein sequence, and the use of machine learning algorithms in both feature selection and the construction of new classification models. These new feature descriptors include amino acid composition, sequence order, physicochemical properties, multiprofile Bayes, and secondary structure-based features. Machine learning methods, such as artificial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), random forest, deep learning, and examples of their application are discussed in detail. We also present our view on possible future directions, challenges, and opportunities for the applications of machine learning algorithms for prediction of protein structural classes.

  • Research Article
  • 10.1016/s0968-0004(99)01450-4
Biology in silico – a mixed bag : Computational Methods in Molecular Biology (New Comprehensive Biochemistry Vol. 32) edited by S. L. Salzberg, D. B. Searls and S. Kasif
  • Nov 1, 1999
  • Trends in Biochemical Sciences
  • L Aravind

Biology in silico – a mixed bag : Computational Methods in Molecular Biology (New Comprehensive Biochemistry Vol. 32) edited by S. L. Salzberg, D. B. Searls and S. Kasif

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 17
  • 10.1371/journal.pone.0006254
PSPP: A Protein Structure Prediction Pipeline for Computing Clusters
  • Jul 16, 2009
  • PLoS ONE
  • Michael S Lee + 6 more

BackgroundProtein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.Methodology/Principal FindingsThe pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.ConclusionsThe standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.

  • Research Article
  • Cite Count Icon 87
  • 10.1007/s10930-021-10003-y
Protein Structure Prediction: Conventional and Deep Learning Perspectives.
  • May 28, 2021
  • The Protein Journal
  • V A Jisna + 1 more

Protein structure prediction is a way to bridge the sequence-structure gap, one of the main challenges in computational biology and chemistry. Predicting any protein's accurate structure is of paramount importance for the scientific community, as these structures govern their function. Moreover, this is one of the complicated optimization problems that computational biologists have ever faced. Experimental protein structure determination methods include X-ray crystallography, Nuclear Magnetic Resonance Spectroscopy and Electron Microscopy. All of these are tedious and time-consuming procedures that require expertise. To make the process less cumbersome, scientists use predictive tools as part of computational methods, using data consolidated in the protein repositories. In recent years, machine learning approaches have raised the interest of the structure prediction community. Most of the machine learning approaches for protein structure prediction are centred on co-evolution based methods. The accuracy of these approaches depends on the number of homologous protein sequences available in the databases. The prediction problem becomes challenging for many proteins, especially those without enough sequence homologs. Deep learning methods allow for the extraction of intricate features from protein sequence data without making any intuitions. Accurately predicted protein structures are employed for drug discovery, antibody designs, understanding protein-protein interactions, and interactions with other molecules. This article provides a review of conventional and deep learning approaches in protein structure prediction. We conclude this review by outlining a few publicly available datasets and deep learning architectures currently employed for protein structure prediction tasks.

  • Book Chapter
  • Cite Count Icon 11
  • 10.1002/9780470015902.a0003031.pub2
Protein Structure Prediction
  • Aug 15, 2012
  • Encyclopedia of Life Sciences
  • Ambrish Roy + 1 more

The goal of protein structure prediction is to estimate the spatial position of every atom of protein molecules from the amino acid sequence by computational methods. Depending on the availability of homologous templates in the PDB library, structure prediction approaches are categorised into template‐based modelling (TBM) and free modelling (FM). While TBM is by far the only reliable method for high‐resolution structure prediction, challenges in the field include constructing the correct folds without using template structures and refining the template models closer to the native state when templates are available. Nevertheless, the usefulness of various levels of protein structure predictions have been convincingly demonstrated in biological and medical applications. Key Concepts: Evolution is a general principle to guide protein structure and function predictions. Proteins of similar sequence have similar 3D structure. Function of protein is decided by the 3D structure. TBM using homologous templates has the highest accuracy. Template structure can be refined by combining multiple templates. Current physics‐based ab initio folding can only fold small proteins. Threading is an efficient tool for detecting distantly homologous templates. Membrane protein structure prediction is challenging due to the lack of templates. Disordered regions exist in protein which does not possess stable structure but has important function implications.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.entcs.2014.06.014
MASTERS: A General Sequence-based MultiAgent System for Protein TERtiary Structure Prediction
  • Jul 1, 2014
  • Electronic Notes in Theoretical Computer Science
  • Thiago Lipinski-Paes + 1 more

MASTERS: A General Sequence-based MultiAgent System for Protein TERtiary Structure Prediction

  • Research Article
  • Cite Count Icon 66
  • 10.1038/s41587-025-02654-4
Deep-learning-based single-domain and multidomain protein structure prediction with D-I-TASSER.
  • May 23, 2025
  • Nature biotechnology
  • Wei Zheng + 8 more

The dominant success of deep learning techniques on protein structure prediction has challenged the necessity and usefulness of traditional force field-based folding simulations. We proposed a hybrid approach, deep-learning-based iterative threading assembly refinement (D-I-TASSER), which constructs atomic-level protein structural models by integrating multisource deep learning potentials with iterative threading fragment assembly simulations. D-I-TASSER introduces a domain splitting and assembly protocol for the automated modeling of large multidomain protein structures. Benchmark tests and the most recent critical assessment of protein structure prediction, 15 experiments demonstrate that D-I-TASSER outperforms AlphaFold2 and AlphaFold3 on both single-domain and multidomain proteins. Large-scale folding experiments further show that D-I-TASSER could fold 81% of protein domains and 73% of full-chain sequences in the human proteome with results highly complementary to recently released models by AlphaFold2. These results highlight a new avenue to integrate deep learning with classical physics-based folding simulations for high-accuracy protein structure and function predictions that are usable in genome-wide applications.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1002/9780470015902.a0006214.pub2
Protein Structure Prediction and Databases
  • Sep 15, 2014
  • Encyclopedia of Life Sciences
  • Olga V Kalinina + 1 more

Three‐dimensional structures of proteins are the key to understanding their molecular function. Most reliably protein structures are determined by experiment. Recent advances in experimental techniques have lead to a large increase in numbers of both protein sequences and 3D structures. Yet, the number of experimentally resolved proteins 3D structures is three orders of magnitude lower than that of sequences. This calls for computer support of protein structure prediction. Today several databases complement the comparatively small set of experimentally resolved protein structures with much larger sets of protein models generated by computer. Key Concepts: Protein structure prediction relies heavily on the experimental data on protein structures; the volume of such data is the prime determinant for the quality of protein structure predictions. The three major types of methods for protein structure prediction are homology, or template‐based modelling; fold recognition, or threading; de novo , or ab initio prediction. Homology modelling is the most reliable class of methods, but require experimental knowledge of a structure of a homologous – and thus structurally similar – protein, called the template. Sensitive sequence similarity search tools are used for detection of potential templates. The protein structure is modelled step‐wise: (1) aligning the target protein to the template, (2) placing the aligned target residues onto their respective template residues, (3) placing the side chains of nonconserved residues, healing backbone breaks and modelling loops that form gaps in the alignment, and (4) refining the model. The two most popular computational tools for homology modelling are MODELLER and SWISS‐MODEL; the two protein model databases based on them are ModBase and the SWISS‐MODEL Repository, respectively. The Protein Modelling Portal unites data from these and other databases, and provides an independent system for model evaluation called CAMEO.

  • Research Article
  • Cite Count Icon 124
  • 10.1016/j.jbc.2021.100870
Toward the solution of the protein structure prediction problem
  • Jun 11, 2021
  • The Journal of Biological Chemistry
  • Robin Pearce + 1 more

Since Anfinsen demonstrated that the information encoded in a protein’s amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/icccnt.2013.6726753
Bioinformatics: Protein structure prediction
  • Jul 1, 2013
  • Chandrayani N Rokde + 1 more

Proteins are essential parts of our life and participate in virtually every process within a cell. The understanding of protein structures is vital to determine the function of a protein. Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. Thus, protein structure prediction is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. For minimizing the time, computational methods are used for protein folding and structure prediction problem. In this paper results of protein p53 are discussed.

  • Research Article
  • Cite Count Icon 10
  • 10.2174/1570178614666170511165837
Predicting Protein Structural Class for Low-Similarity Sequences via Novel Evolutionary Modes of PseAAC and Recursive Feature Elimination
  • Sep 29, 2017
  • Letters in Organic Chemistry
  • Liang Kong + 4 more

Background and Objective: Protein structural class prediction is a first and key step in protein structure prediction and has become an active research area in biochemistry and bioinformatics. An important aspect for this prediction task is exploring good feature representation. Prior works have demonstrated the effectiveness of the PSI-BLAST profile based feature extraction methods especially for low-similarity protein sequences. However, the prediction accuracies still remain limited. This highlights the need for keeping on exploring the potential of evolutionary information. Method: In this study, three novel sequence evolutionary modes of pseudo amino acid composition (PseAAC) are proposed and optimized by a two-stage feature selection process based on recursive feature elimination strategy. The selected top-ranking features are then fed into a linear kernel support vector machine classifier to predict the protein structure class. To evaluate the performance of the proposed method, jackknife tests are performed on three widely used low-similarity benchmark datasets (25PDB, 1189 and 640). Results: With comprehensive comparison with the current state-of-the-art methods, the proposed method achieves superior performance. The overall accuracies on 25PDB, 1189 and 640 datasets are 96.2%, 97.9% and 99.5%, which are 1.9%, 1.5% and 2.3% higher than previous best-performing method. Conclusion: The satisfactory prediction accuracies achieved by the proposed method are attributed to the specially designed sequence evolutionary modes of PseAAC and the effective feature selection strategy, which cover more discriminative sequence order information. It is anticipated that our method would be helpful in other prediction problems in protein research. Keywords: Feature selection, position specific score matrix, protein structural class, recursive feature elimination, sequence similarity, support vector machine.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1038/s41598-021-92395-6
MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction
  • Jun 23, 2021
  • Scientific Reports
  • Tianqi Wu + 4 more

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant