Comparative evaluation of the prediction accuracy of AlphaFold and ESMFold for monomeric and dimeric proteins

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

We have evaluated the prediction accuracy of three different tools, deep-learning-based AlphaFold2, AlphaFold3, and large language model-based ESMFold, utilizing the experimentally derived structures deposited in the Protein Data Bank between 2022 and 2024, excluding those entries with close homologs in the structures released prior to 2022. Based on the criteria of sequence identity lower than 40% and query coverage <70%, 1666 monomeric and 994 dimeric proteins were selected as challenging targets for benchmarking. Our analysis showed that AlphaFold2 and AlphaFold3 correctly predicted 88% of monomeric structures and 77% of dimeric proteins. On the other hand, ESMFold accurately predicted 76% of the monomeric proteins and 41% of the dimeric proteins. Since most incorrect predictions involved nuclear magnetic resonance structures, benchmarking on X-ray and cryo-electron microscopy structures showed that the prediction accuracy of AlphaFold and ESMFold was 95% and 83%, respectively, for monomeric proteins. Overall, these findings demonstrate significant differences in the prediction accuracy of these machine learning (ML)-based tools for monomeric and dimeric proteins, highlighting the advantages and limitations of these tools. Finally, to facilitate easy access to benchmarking data, we developed ProModEv (Protein Model Evaluation portal), an interactive web portal for systematic analysis of these benchmarking results, and it is available at http://pdbi.nii.ac.in/ProModEv/.

Similar Papers
  • Research Article
  • Cite Count Icon 28
  • 10.1016/j.jmr.2023.107481
Blind assessment of monomeric AlphaFold2 protein structure models with experimental NMR data
  • May 20, 2023
  • Journal of magnetic resonance (San Diego, Calif. : 1997)
  • Ethan H Li + 8 more

Recent advances in molecular modeling of protein structures are changing the field of structural biology. AlphaFold-2 (AF2), an AI system developed by DeepMind, Inc., utilizes attention-based deep learning to predict models of protein structures with high accuracy relative to structures determined by X-ray crystallography and cryo-electron microscopy (cryoEM). Comparing AF2 models to structures determined using solution NMR data, both high similarities and distinct differences have been observed. Since AF2 was trained on X-ray crystal and cryoEM structures, we assessed how accurately AF2 can model small, monomeric, solution protein NMR structures which (i) were not used in the AF2 training data set, and (ii) did not have homologous structures in the Protein Data Bank at the time of AF2 training. We identified nine open-source protein NMR data sets for such “blind” targets, including chemical shift, raw NMR FID data, NOESY peak lists, and (for 1 case) 15N-1H residual dipolar coupling data. For these nine small (70–108 residues) monomeric proteins, we generated AF2 prediction models and assessed how well these models fit to these experimental NMR data, using several well-established NMR structure validation tools. In most of these cases, the AF2 models fit the NMR data nearly as well, or sometimes better than, the corresponding NMR structure models previously deposited in the Protein Data Bank. These results provide benchmark NMR data for assessing new NMR data analysis and protein structure prediction methods. They also document the potential for using AF2 as a guiding tool in protein NMR data analysis, and more generally for hypothesis generation in structural biology research.

  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.bios.2011.07.011
Enhancing immunoassay detection of antigens with multimeric protein Gs
  • Jul 18, 2011
  • Biosensors and Bioelectronics
  • Jin Hyung Lee + 4 more

Enhancing immunoassay detection of antigens with multimeric protein Gs

  • Peer Review Report
  • 10.7554/elife.73862.sa0
Editor's evaluation: Destabilizers of the thymidylate synthase homodimer accelerate its proteasomal degradation and inhibit cancer growth
  • Feb 15, 2022
  • Goutham Narla

Editor's evaluation: Destabilizers of the thymidylate synthase homodimer accelerate its proteasomal degradation and inhibit cancer growth

  • Peer Review Report
  • 10.7554/elife.73862.sa1
Decision letter: Destabilizers of the thymidylate synthase homodimer accelerate its proteasomal degradation and inhibit cancer growth
  • Feb 15, 2022
  • Yatrik M Shah

Decision letter: Destabilizers of the thymidylate synthase homodimer accelerate its proteasomal degradation and inhibit cancer growth

  • Research Article
  • Cite Count Icon 16
  • 10.1002/pro.2361
Structural templates for modeling homodimers
  • Sep 20, 2013
  • Protein Science
  • Petras J Kundrotas + 2 more

Oligomeric proteins are more abundant in nature than monomeric proteins, and involved in all biological processes. In the absence of an experimental structure, their subunits can be modeled from their sequence like monomeric proteins, but reliable procedures to build the oligomeric assembly are scarce. Template-based methods, which start from known protein structures, are commonly applied to model subunits. We present a method to model homodimers that relies on a structural alignment of the subunits, and test it on a set of 511 target structures recently released by the Protein Data Bank, taking as templates the earlier released structures of 3108 homodimeric proteins (H-set), and 2691 monomeric proteins that form dimer-like assemblies in crystals (M-set). The structural alignment identifies a H-set template for 97% of the targets, and in half of the cases, it yields a correct model of the dimer geometry and residue-residue contacts in the target. It also identifies a M-set template for most of the targets, and some of the crystal dimers are very similar to the target homodimers. The procedure efficiently detects homology at low levels of sequence identities, and points to erroneous quaternary structures in the Protein Data Bank. The high coverage of the target set suggests that the content of the Protein Data Bank already approaches the structural diversity of protein assemblies in nature, and that template-based methods should become the choice method for modeling oligomeric as well as monomeric proteins.

  • Research Article
  • Cite Count Icon 27
  • 10.1093/emboj/18.15.4149
Molecular interactions in ribose transport: the binding protein module symmetrically associates with the homodimeric membrane transporter.
  • Aug 2, 1999
  • The EMBO journal
  • Y Park

The Escherichia coli high-affinity ribose transporter is composed of the periplasmic ribose-binding protein (RBP or RbsB), the membrane component (RbsC) and the ATP-binding protein (RbsA). In order to dissect the molecular interactions initiating the transport process, RbsC suppressors for transport-defective rbsB mutations were isolated. These suppressors are localized in two regions of RbsC, which are allele-specific to N- or C-terminal domain mutations of RBP, suggesting that there are two distinct regions of RbsC, each interacting with one of the two domains of RBP. To demonstrate that these two regions provide a homodimeric binding surface for RBP we constructed a dimeric rbsC in which two genes are joined tandemly from head to tail with the addition of a linker. The dimeric RbsC protein is stable and functional in growth and ribose uptake. By exploiting the allele specificity between the domain-specific mutations and their suppressors, we generated all mutation-suppressor combinations in a single rbsB plus the dimeric rbsC genes. Their phenotypes are consistent with the proposal that the binding protein module interacts symmetrically with homodimeric RbsC. The mode of association proposed here for the ribose transport components could be extended to other ABC transporters with similar structural organizations.

  • Research Article
  • Cite Count Icon 7
  • 10.1002/prot.20650
Crystal structure of phosphoribosylformylglycinamidine synthase II (smPurL) from Thermotoga maritima at 2.15 Å resolution
  • Mar 16, 2006
  • Proteins: Structure, Function, and Bioinformatics
  • I.I Mathews + 40 more

Crystal structure of phosphoribosylformylglycinamidine synthase II (smPurL) from <i>Thermotoga maritima</i> at 2.15 Å resolution

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 101
  • 10.1074/jbc.m601278200
Structural Changes of Escherichia coli Ferric Uptake Regulator during Metal-dependent Dimerization and Activation Explored by NMR and X-ray Crystallography
  • Jul 1, 2006
  • Journal of Biological Chemistry
  • Ludovic Pecqueur + 7 more

Ferric uptake regulator (Fur) is a global bacterial regulator that uses iron as a cofactor to bind to specific DNA sequences. Escherichia coli Fur is usually isolated as a homodimer with two metal sites per subunit. Metal binding to the iron site induces protein activation; however the exact role of the structural zinc site is still unknown. Structural studies of three different forms of the Escherichia coli Fur protein (nonactivated dimer, monomer, and truncated Fur-(1-82)) were performed. Dimerization of the oxidized monomer was followed by NMR in the presence of a reductant (dithiothreitol) and Zn(II). Reduction of the disulfide bridges causes only local structure variations, whereas zinc addition to reduced Fur induces protein dimerization. This demonstrates for the first time the essential role of zinc in the stabilization of the quaternary structure. The secondary structures of the mono- and dimeric forms are almost conserved in the N-terminal DNA-binding domain, except for the first helix, which is not present in the nonactivated dimer. In contrast, the C-terminal dimerization domain is well structured in the dimer but appears flexible in the monomer. This is also confirmed by heteronuclear Overhauser effect data. The crystal structure at 1.8A resolution of a truncated protein (Fur-(1-82)) is described and found to be identical to the N-terminal domain in the monomeric and in the metal-activated state. Altogether, these data allow us to propose an activation mechanism for E. coli Fur involving the folding/unfolding of the N-terminal helix.

  • Research Article
  • Cite Count Icon 20
  • 10.1002/prot.22220
Crystal structure of glutathione‐dependent phospholipid peroxidase Hyr1 from the yeast Saccharomyces cerevisiae
  • Sep 2, 2008
  • Proteins: Structure, Function, and Bioinformatics
  • Wen‐Juan Zhang + 5 more

Crystal structure of glutathione‐dependent phospholipid peroxidase Hyr1 from the yeast <i>Saccharomyces cerevisiae</i>

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1074/jbc.m112.355883
Spontaneous Dimerization of Titin Protein Z1Z2 Domains Induces Strong Nanomechanical Anchoring
  • Jun 1, 2012
  • Journal of Biological Chemistry
  • Sergi Garcia-Manyes + 4 more

Muscle elasticity strongly relies on the mechanical anchoring of the giant protein titin to both the sarcomere M-band and the Z-disk. Such strong attachment ensures the reversible dynamics of the stretching-relaxing cycles determining the muscle passive elasticity. Similarly, the design of biomaterials with enhanced elastic function requires experimental strategies able to secure the constituent molecules to avoid mechanical failure. Here we show that an engineered titin-mimicking protein is able to spontaneously dimerize in solution. Our observations reveal that the titin Z1Z2 domains are key to induce dimerization over a long-range distance in proteins that would otherwise remain in their monomeric form. Using single molecule force spectroscopy, we measure the threshold force that triggers the noncovalent transition from protein dimer to monomer, occurring at ∼700 piconewtons. Such extremely high mechanical stability is likely to be a natural protective mechanism that guarantees muscle integrity. We propose a simple molecular model to understand the force-induced dimer-to-monomer transition based on the geometric distribution of forces occurring within a dimeric protein under mechanical tension.

  • Research Article
  • Cite Count Icon 15
  • 10.1002/prot.20420
Crystal structure of an indigoidine synthase A (IndA)‐like protein (TM1464) from Thermotoga maritima at 1.90 Å resolution reveals a new fold
  • Apr 8, 2005
  • Proteins: Structure, Function, and Bioinformatics
  • Inna Levin + 49 more

Crystal structure of an indigoidine synthase A (IndA)‐like protein (TM1464) from <i>Thermotoga maritima</i> at 1.90 Å resolution reveals a new fold

  • Research Article
  • Cite Count Icon 5
  • 10.1093/protein/gzn040
Engineering of a monomeric fluorescent protein AsGFP499 and its applications in a dual translocation and transcription assay
  • Aug 1, 2008
  • Protein Engineering, Design and Selection
  • Aynur Tasdemir + 6 more

The tetrameric green fluorescent protein AsGFP(499) from the sea anemone Anemonia sulcata was converted into a dimeric and monomeric protein by site-directed mutagenesis. The protein was engineered without prior knowledge of its crystal structure based on a sequence alignment of multiple proteins belonging to the GFP-family. Crucial residues for oligomerisation of AsGFP(499) were predicted and selected for mutation. By introduction of a single site mutation (S103K) the A/B subunit was disrupted whereas two substitutions were necessary to separate the A/C subunit (T159K/F173E). This method can be applied as a general predictive method for designing monomeric proteins from multimeric fluorescent proteins. The maturation temperature was optimised to 37 degrees C by a combination of Site-directed and random mutagenesis. In cell-based assays, the NFATc1A (nuclear factor of activated T-cells, subtype 1, isoform A)-AsGFP(499) chimera formed massive cytoplasmic aggregates in HeLa cells, which prevented the shuttling of NFATc1A into the nucleus and consequentially its transcriptional activity. In contrast, the cells expressing the NFATc1A in fusion with our engineered dimeric and monomeric fluorescent mutants were homogeneously distributed throughout the cytoplasm, making these stable cell lines functional in both translocation and transcriptonal assays. This new dual cellular assay will allow the screening and discovery of new drugs that target NFAT cellular processes.

  • Research Article
  • Cite Count Icon 1
  • 10.1101/2023.01.22.525096
Blind Assessment of Monomeric AlphaFold2 Protein Structure Models with Experimental NMR Data
  • Jan 22, 2023
  • bioRxiv
  • Ethan H Li + 8 more

Recent advances in molecular modeling of protein structures are changing the field of structural biology. AlphaFold-2 (AF2), an AI system developed by DeepMind, Inc., utilizes attention-based deep learning to predict models of protein structures with high accuracy relative to structures determined by X-ray crystallography and cryo-electron microscopy (cryoEM). Comparing AF2 models to structures determined using solution NMR data, both high similarities and distinct differences have been observed. Since AF2 was trained on X-ray crystal and cryoEM structures, we assessed how accurately AF2 can model small, monomeric, solution protein NMR structures which (i) were not used in the AF2 training data set, and (ii) did not have homologous structures in the Protein Data Bank at the time of AF2 training. We identified nine open source protein NMR data sets for such “blind” targets, including chemical shift, raw NMR FID data, NOESY peak lists, and (for 1 case) 15N-1H residual dipolar coupling data. For these nine small (70 – 108 residues) monomeric proteins, we generated AF2 prediction models and assessed how well these models fit to these experimental NMR data, using several well-established NMR structure validation tools. In most of these cases, the AF2 models fit the NMR data nearly as well, or sometimes better than, the corresponding NMR structure models previously deposited in the Protein Data Bank. These results provide benchmark NMR data for assessing new NMR data analysis and protein structure prediction methods. They also document the potential for using AF2 as a guiding tool in protein NMR data analysis, and more generally for hypothesis generation in structural biology research.

  • Research Article
  • Cite Count Icon 12
  • 10.1021/acs.analchem.1c04989
Construction of a Mass-Tagged Oligo Probe Set for Revealing Protein Ratiometric Relationship Associated with EGFR-HER2 Heterodimerization in Living Cells.
  • Jun 16, 2022
  • Analytical Chemistry
  • Xiaoxu Li + 4 more

Protein dimerization, as the most common form of protein-protein interaction, can manifest more significant roles in cellular signaling than individual monomers. For example, excessive formation of EGFR-HER2 dimer has been implicated in cancer development and therapeutic resistance in addition to the overexpression of EGFR and HER2 proteins. Thus, quantitative evaluation of these heterodimers in living cells and revelation of their ratiometric relationship with protein monomers in dimerization may provide insights into clinical cancer management. To achieve this goal, the prerequisite is protein heterodimer quantification. Given the current lack of quantitative methods, we constructed a mass-tagged oligo nanoprobe set for quantification of EGFR-HER2 dimer in living cells. The mass-tagged oligo nanoprobe set contained two targeting probes (nucleic acid aptamers), a connector probe, a hairpin probe, and a photocleavable mass-tagged probe. Two distinct aptamers can recognize target protein monomers and initiate the subsequent hybridization cascade involving binding to the connector probe, formation of an initiator strand, opening of a hairpin probe, and ensuing hybridization with a photocleavable mass-tagged probe. Ultimately, the mass tag was released under ultraviolet light and then subjected to mass spectrometric analysis. In this way, the information regarding the interaction between two protein monomers was successfully converted to the quantitative signal of the mass tag. Using the assay, the expression level of EGFR-HER2 dimer and its relationship with individual protein monomers were determined in four breast cancer cell lines. We are among the first to obtain the absolute level of protein heterodimer, and this quantitative information may be vital in understanding the molecular basis of cancer.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 30
  • 10.1074/jbc.m803595200
Structural Analysis of a Periplasmic Binding Protein in the Tripartite ATP-independent Transporter Family Reveals a Tetrameric Assembly That May Have a Role in Ligand Transport
  • Nov 1, 2008
  • Journal of Biological Chemistry
  • Matthew J Cuneo + 5 more

Several bacterial solute transport mechanisms involve members of the periplasmic binding protein (PBP) superfamily that bind and deliver ligand to integral membrane transport proteins in the ATP-binding cassette, tripartite tricarboxylate transporter, or tripartite ATP-independent (TRAP) families. PBPs involved in ATP-binding cassette transport systems have been well characterized, but only a few PBPs involved in TRAP transport have been studied. We have measured the thermal stability, determined the oligomerization state by small angle x-ray scattering, and solved the x-ray crystal structure to 1.9 A resolution of a TRAP-PBP (open reading frame tm0322) from the hyperthermophilic bacterium Thermotoga maritima (TM0322). The overall fold of TM0322 is similar to other TRAP transport related PBPs, although the structural similarity of backbone atoms (2.5-3.1 A root mean square deviation) is unusually low for PBPs within the same group. Individual monomers within the tetrameric asymmetric unit of TM0322 exhibit high root mean square deviation (0.9 A) to each other as a consequence of conformational heterogeneity in their binding pockets. The gel filtration elution profile and the small angle x-ray scattering analysis indicate that TM0322 assembles as dimers in solution that in turn assemble into a dimer of dimers in the crystallographic asymmetric unit. Tetramerization has been previously observed in another TRAP-PBP (the Rhodobacter sphaeroides alpha-keto acid-binding protein) where quaternary structure formation is postulated to be an important requisite for the transmembrane transport process.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.