Comparative evaluation of the prediction accuracy of AlphaFold and ESMFold for monomeric and dimeric proteins
We have evaluated the prediction accuracy of three different tools, deep-learning-based AlphaFold2, AlphaFold3, and large language model-based ESMFold, utilizing the experimentally derived structures deposited in the Protein Data Bank between 2022 and 2024, excluding those entries with close homologs in the structures released prior to 2022. Based on the criteria of sequence identity lower than 40% and query coverage <70%, 1666 monomeric and 994 dimeric proteins were selected as challenging targets for benchmarking. Our analysis showed that AlphaFold2 and AlphaFold3 correctly predicted 88% of monomeric structures and 77% of dimeric proteins. On the other hand, ESMFold accurately predicted 76% of the monomeric proteins and 41% of the dimeric proteins. Since most incorrect predictions involved nuclear magnetic resonance structures, benchmarking on X-ray and cryo-electron microscopy structures showed that the prediction accuracy of AlphaFold and ESMFold was 95% and 83%, respectively, for monomeric proteins. Overall, these findings demonstrate significant differences in the prediction accuracy of these machine learning (ML)-based tools for monomeric and dimeric proteins, highlighting the advantages and limitations of these tools. Finally, to facilitate easy access to benchmarking data, we developed ProModEv (Protein Model Evaluation portal), an interactive web portal for systematic analysis of these benchmarking results, and it is available at http://pdbi.nii.ac.in/ProModEv/.
- Research Article
28
- 10.1016/j.jmr.2023.107481
- May 20, 2023
- Journal of magnetic resonance (San Diego, Calif. : 1997)
Recent advances in molecular modeling of protein structures are changing the field of structural biology. AlphaFold-2 (AF2), an AI system developed by DeepMind, Inc., utilizes attention-based deep learning to predict models of protein structures with high accuracy relative to structures determined by X-ray crystallography and cryo-electron microscopy (cryoEM). Comparing AF2 models to structures determined using solution NMR data, both high similarities and distinct differences have been observed. Since AF2 was trained on X-ray crystal and cryoEM structures, we assessed how accurately AF2 can model small, monomeric, solution protein NMR structures which (i) were not used in the AF2 training data set, and (ii) did not have homologous structures in the Protein Data Bank at the time of AF2 training. We identified nine open-source protein NMR data sets for such “blind” targets, including chemical shift, raw NMR FID data, NOESY peak lists, and (for 1 case) 15N-1H residual dipolar coupling data. For these nine small (70–108 residues) monomeric proteins, we generated AF2 prediction models and assessed how well these models fit to these experimental NMR data, using several well-established NMR structure validation tools. In most of these cases, the AF2 models fit the NMR data nearly as well, or sometimes better than, the corresponding NMR structure models previously deposited in the Protein Data Bank. These results provide benchmark NMR data for assessing new NMR data analysis and protein structure prediction methods. They also document the potential for using AF2 as a guiding tool in protein NMR data analysis, and more generally for hypothesis generation in structural biology research.
- Research Article
23
- 10.1016/j.bios.2011.07.011
- Jul 18, 2011
- Biosensors and Bioelectronics
Enhancing immunoassay detection of antigens with multimeric protein Gs
- Peer Review Report
- 10.7554/elife.73862.sa0
- Feb 15, 2022
Editor's evaluation: Destabilizers of the thymidylate synthase homodimer accelerate its proteasomal degradation and inhibit cancer growth
- Peer Review Report
- 10.7554/elife.73862.sa1
- Feb 15, 2022
Decision letter: Destabilizers of the thymidylate synthase homodimer accelerate its proteasomal degradation and inhibit cancer growth
- Research Article
16
- 10.1002/pro.2361
- Sep 20, 2013
- Protein Science
Oligomeric proteins are more abundant in nature than monomeric proteins, and involved in all biological processes. In the absence of an experimental structure, their subunits can be modeled from their sequence like monomeric proteins, but reliable procedures to build the oligomeric assembly are scarce. Template-based methods, which start from known protein structures, are commonly applied to model subunits. We present a method to model homodimers that relies on a structural alignment of the subunits, and test it on a set of 511 target structures recently released by the Protein Data Bank, taking as templates the earlier released structures of 3108 homodimeric proteins (H-set), and 2691 monomeric proteins that form dimer-like assemblies in crystals (M-set). The structural alignment identifies a H-set template for 97% of the targets, and in half of the cases, it yields a correct model of the dimer geometry and residue-residue contacts in the target. It also identifies a M-set template for most of the targets, and some of the crystal dimers are very similar to the target homodimers. The procedure efficiently detects homology at low levels of sequence identities, and points to erroneous quaternary structures in the Protein Data Bank. The high coverage of the target set suggests that the content of the Protein Data Bank already approaches the structural diversity of protein assemblies in nature, and that template-based methods should become the choice method for modeling oligomeric as well as monomeric proteins.
- Research Article
27
- 10.1093/emboj/18.15.4149
- Aug 2, 1999
- The EMBO journal
The Escherichia coli high-affinity ribose transporter is composed of the periplasmic ribose-binding protein (RBP or RbsB), the membrane component (RbsC) and the ATP-binding protein (RbsA). In order to dissect the molecular interactions initiating the transport process, RbsC suppressors for transport-defective rbsB mutations were isolated. These suppressors are localized in two regions of RbsC, which are allele-specific to N- or C-terminal domain mutations of RBP, suggesting that there are two distinct regions of RbsC, each interacting with one of the two domains of RBP. To demonstrate that these two regions provide a homodimeric binding surface for RBP we constructed a dimeric rbsC in which two genes are joined tandemly from head to tail with the addition of a linker. The dimeric RbsC protein is stable and functional in growth and ribose uptake. By exploiting the allele specificity between the domain-specific mutations and their suppressors, we generated all mutation-suppressor combinations in a single rbsB plus the dimeric rbsC genes. Their phenotypes are consistent with the proposal that the binding protein module interacts symmetrically with homodimeric RbsC. The mode of association proposed here for the ribose transport components could be extended to other ABC transporters with similar structural organizations.
- Research Article
7
- 10.1002/prot.20650
- Mar 16, 2006
- Proteins: Structure, Function, and Bioinformatics
Crystal structure of phosphoribosylformylglycinamidine synthase II (smPurL) from <i>Thermotoga maritima</i> at 2.15 Å resolution
- Research Article
101
- 10.1074/jbc.m601278200
- Jul 1, 2006
- Journal of Biological Chemistry
Ferric uptake regulator (Fur) is a global bacterial regulator that uses iron as a cofactor to bind to specific DNA sequences. Escherichia coli Fur is usually isolated as a homodimer with two metal sites per subunit. Metal binding to the iron site induces protein activation; however the exact role of the structural zinc site is still unknown. Structural studies of three different forms of the Escherichia coli Fur protein (nonactivated dimer, monomer, and truncated Fur-(1-82)) were performed. Dimerization of the oxidized monomer was followed by NMR in the presence of a reductant (dithiothreitol) and Zn(II). Reduction of the disulfide bridges causes only local structure variations, whereas zinc addition to reduced Fur induces protein dimerization. This demonstrates for the first time the essential role of zinc in the stabilization of the quaternary structure. The secondary structures of the mono- and dimeric forms are almost conserved in the N-terminal DNA-binding domain, except for the first helix, which is not present in the nonactivated dimer. In contrast, the C-terminal dimerization domain is well structured in the dimer but appears flexible in the monomer. This is also confirmed by heteronuclear Overhauser effect data. The crystal structure at 1.8A resolution of a truncated protein (Fur-(1-82)) is described and found to be identical to the N-terminal domain in the monomeric and in the metal-activated state. Altogether, these data allow us to propose an activation mechanism for E. coli Fur involving the folding/unfolding of the N-terminal helix.
- Research Article
20
- 10.1002/prot.22220
- Sep 2, 2008
- Proteins: Structure, Function, and Bioinformatics
Crystal structure of glutathione‐dependent phospholipid peroxidase Hyr1 from the yeast <i>Saccharomyces cerevisiae</i>
- Research Article
12
- 10.1074/jbc.m112.355883
- Jun 1, 2012
- Journal of Biological Chemistry
Muscle elasticity strongly relies on the mechanical anchoring of the giant protein titin to both the sarcomere M-band and the Z-disk. Such strong attachment ensures the reversible dynamics of the stretching-relaxing cycles determining the muscle passive elasticity. Similarly, the design of biomaterials with enhanced elastic function requires experimental strategies able to secure the constituent molecules to avoid mechanical failure. Here we show that an engineered titin-mimicking protein is able to spontaneously dimerize in solution. Our observations reveal that the titin Z1Z2 domains are key to induce dimerization over a long-range distance in proteins that would otherwise remain in their monomeric form. Using single molecule force spectroscopy, we measure the threshold force that triggers the noncovalent transition from protein dimer to monomer, occurring at ∼700 piconewtons. Such extremely high mechanical stability is likely to be a natural protective mechanism that guarantees muscle integrity. We propose a simple molecular model to understand the force-induced dimer-to-monomer transition based on the geometric distribution of forces occurring within a dimeric protein under mechanical tension.
- Research Article
15
- 10.1002/prot.20420
- Apr 8, 2005
- Proteins: Structure, Function, and Bioinformatics
Crystal structure of an indigoidine synthase A (IndA)‐like protein (TM1464) from <i>Thermotoga maritima</i> at 1.90 Å resolution reveals a new fold
- Research Article
5
- 10.1093/protein/gzn040
- Aug 1, 2008
- Protein Engineering, Design and Selection
The tetrameric green fluorescent protein AsGFP(499) from the sea anemone Anemonia sulcata was converted into a dimeric and monomeric protein by site-directed mutagenesis. The protein was engineered without prior knowledge of its crystal structure based on a sequence alignment of multiple proteins belonging to the GFP-family. Crucial residues for oligomerisation of AsGFP(499) were predicted and selected for mutation. By introduction of a single site mutation (S103K) the A/B subunit was disrupted whereas two substitutions were necessary to separate the A/C subunit (T159K/F173E). This method can be applied as a general predictive method for designing monomeric proteins from multimeric fluorescent proteins. The maturation temperature was optimised to 37 degrees C by a combination of Site-directed and random mutagenesis. In cell-based assays, the NFATc1A (nuclear factor of activated T-cells, subtype 1, isoform A)-AsGFP(499) chimera formed massive cytoplasmic aggregates in HeLa cells, which prevented the shuttling of NFATc1A into the nucleus and consequentially its transcriptional activity. In contrast, the cells expressing the NFATc1A in fusion with our engineered dimeric and monomeric fluorescent mutants were homogeneously distributed throughout the cytoplasm, making these stable cell lines functional in both translocation and transcriptonal assays. This new dual cellular assay will allow the screening and discovery of new drugs that target NFAT cellular processes.
- Research Article
1
- 10.1101/2023.01.22.525096
- Jan 22, 2023
- bioRxiv
Recent advances in molecular modeling of protein structures are changing the field of structural biology. AlphaFold-2 (AF2), an AI system developed by DeepMind, Inc., utilizes attention-based deep learning to predict models of protein structures with high accuracy relative to structures determined by X-ray crystallography and cryo-electron microscopy (cryoEM). Comparing AF2 models to structures determined using solution NMR data, both high similarities and distinct differences have been observed. Since AF2 was trained on X-ray crystal and cryoEM structures, we assessed how accurately AF2 can model small, monomeric, solution protein NMR structures which (i) were not used in the AF2 training data set, and (ii) did not have homologous structures in the Protein Data Bank at the time of AF2 training. We identified nine open source protein NMR data sets for such “blind” targets, including chemical shift, raw NMR FID data, NOESY peak lists, and (for 1 case) 15N-1H residual dipolar coupling data. For these nine small (70 – 108 residues) monomeric proteins, we generated AF2 prediction models and assessed how well these models fit to these experimental NMR data, using several well-established NMR structure validation tools. In most of these cases, the AF2 models fit the NMR data nearly as well, or sometimes better than, the corresponding NMR structure models previously deposited in the Protein Data Bank. These results provide benchmark NMR data for assessing new NMR data analysis and protein structure prediction methods. They also document the potential for using AF2 as a guiding tool in protein NMR data analysis, and more generally for hypothesis generation in structural biology research.
- Research Article
12
- 10.1021/acs.analchem.1c04989
- Jun 16, 2022
- Analytical Chemistry
Protein dimerization, as the most common form of protein-protein interaction, can manifest more significant roles in cellular signaling than individual monomers. For example, excessive formation of EGFR-HER2 dimer has been implicated in cancer development and therapeutic resistance in addition to the overexpression of EGFR and HER2 proteins. Thus, quantitative evaluation of these heterodimers in living cells and revelation of their ratiometric relationship with protein monomers in dimerization may provide insights into clinical cancer management. To achieve this goal, the prerequisite is protein heterodimer quantification. Given the current lack of quantitative methods, we constructed a mass-tagged oligo nanoprobe set for quantification of EGFR-HER2 dimer in living cells. The mass-tagged oligo nanoprobe set contained two targeting probes (nucleic acid aptamers), a connector probe, a hairpin probe, and a photocleavable mass-tagged probe. Two distinct aptamers can recognize target protein monomers and initiate the subsequent hybridization cascade involving binding to the connector probe, formation of an initiator strand, opening of a hairpin probe, and ensuing hybridization with a photocleavable mass-tagged probe. Ultimately, the mass tag was released under ultraviolet light and then subjected to mass spectrometric analysis. In this way, the information regarding the interaction between two protein monomers was successfully converted to the quantitative signal of the mass tag. Using the assay, the expression level of EGFR-HER2 dimer and its relationship with individual protein monomers were determined in four breast cancer cell lines. We are among the first to obtain the absolute level of protein heterodimer, and this quantitative information may be vital in understanding the molecular basis of cancer.
- Research Article
30
- 10.1074/jbc.m803595200
- Nov 1, 2008
- Journal of Biological Chemistry
Several bacterial solute transport mechanisms involve members of the periplasmic binding protein (PBP) superfamily that bind and deliver ligand to integral membrane transport proteins in the ATP-binding cassette, tripartite tricarboxylate transporter, or tripartite ATP-independent (TRAP) families. PBPs involved in ATP-binding cassette transport systems have been well characterized, but only a few PBPs involved in TRAP transport have been studied. We have measured the thermal stability, determined the oligomerization state by small angle x-ray scattering, and solved the x-ray crystal structure to 1.9 A resolution of a TRAP-PBP (open reading frame tm0322) from the hyperthermophilic bacterium Thermotoga maritima (TM0322). The overall fold of TM0322 is similar to other TRAP transport related PBPs, although the structural similarity of backbone atoms (2.5-3.1 A root mean square deviation) is unusually low for PBPs within the same group. Individual monomers within the tetrameric asymmetric unit of TM0322 exhibit high root mean square deviation (0.9 A) to each other as a consequence of conformational heterogeneity in their binding pockets. The gel filtration elution profile and the small angle x-ray scattering analysis indicate that TM0322 assembles as dimers in solution that in turn assemble into a dimer of dimers in the crystallographic asymmetric unit. Tetramerization has been previously observed in another TRAP-PBP (the Rhodobacter sphaeroides alpha-keto acid-binding protein) where quaternary structure formation is postulated to be an important requisite for the transmembrane transport process.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.