ASD: antigen-specific antibody database

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

ABSTRACT The development of computational models addressing therapeutic antibodies faces significant challenges. Particularly, the prediction of binding affinity across a diverse set of measurements, due to the scarcity of data. A critical data element is the set of antibody–antigen interaction pairs associated with sequences. To address this issue, we developed the Antigen Specific Antibody Database (ASD, https://naturalantibody.com/agab/), a database aggregating antibody-antigen interaction data from multiple studies with standardized formatting and annotations. Our dataset compilation strategy resulted in data from 15 distinct sources, resulting in 1,097,946 unique antibody–antigen interactions (with 9575 unique antigens). The ASD captures diverse affinity measures and qualitative binding assessment, along with metadata including UniProt and PDB identifiers, target protein names, confidence levels, and experimental conditions such as type of measured affinity, source organism, and germline genes. Through this integration drive, we make available an ample resource of interaction data gathered from the public domain to act as a foundation for model development and further data generation.

Similar Papers
  • Research Article
  • Cite Count Icon 22
  • 10.1021/ci400045v
Automated Large-Scale File Preparation, Docking, and Scoring: Evaluation of ITScore and STScore Using the 2012 Community Structure–Activity Resource Benchmark
  • May 21, 2013
  • Journal of Chemical Information and Modeling
  • Sam Z Grinter + 4 more

In this study, we use the recently released 2012 Community Structure-Activity Resource (CSAR) data set to evaluate two knowledge-based scoring functions, ITScore and STScore, and a simple force-field-based potential (VDWScore). The CSAR data set contains 757 compounds, most with known affinities, and 57 crystal structures. With the help of the script files for docking preparation, we use the full CSAR data set to evaluate the performances of the scoring functions on binding affinity prediction and active/inactive compound discrimination. The CSAR subset that includes crystal structures is used as well, to evaluate the performances of the scoring functions on binding mode and affinity predictions. Within this structure subset, we investigate the importance of accurate ligand and protein conformational sampling and find that the binding affinity predictions are less sensitive to non-native ligand and protein conformations than the binding mode predictions. We also find the full CSAR data set to be more challenging in making binding mode predictions than the subset with structures. The script files used for preparing the CSAR data set for docking, including scripts for canonicalization of the ligand atoms, are offered freely to the academic community.

  • Research Article
  • Cite Count Icon 28
  • 10.1016/j.csbj.2023.11.009
Prediction of protein-ligand binding affinity with deep learning
  • Jan 1, 2023
  • Computational and Structural Biotechnology Journal
  • Yuxiao Wang + 5 more

Prediction of protein-ligand binding affinity with deep learning

  • Research Article
  • Cite Count Icon 254
  • 10.1093/bioinformatics/bty816
Large-scale prediction of binding affinity in protein-small ligand complexes: the PRODIGY-LIG web server.
  • Sep 20, 2018
  • Bioinformatics
  • Anna Vangone + 7 more

Recently we published PROtein binDIng enerGY (PRODIGY), a web-server for the prediction of binding affinity in protein-protein complexes. By using a combination of simple structural properties, such as the residue-contacts made at the interface, PRODIGY has demonstrated a top performance compared with other state-of-the-art predictors in the literature. Here we present an extension of it, named PRODIGY-LIG, aimed at the prediction of affinity in protein-small ligand complexes. The predictive method, properly readapted for small ligand by making use of atomic instead of residue contacts, has been successfully applied for the blind prediction of 102 protein-ligand complexes during the D3R Grand Challenge 2. PRODIGY-LIG has the advantage of being simple, generic and applicable to any kind of protein-ligand complex. It provides an automatic, fast and user-friendly tool ensuring broad accessibility. PRODIGY-LIG is freely available without registration requirements at http://milou.science.uu.nl/services/PRODIGY-LIG.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/bioengineering12050505
StructureNet: Physics-Informed Hybridized Deep Learning Framework for Protein-Ligand Binding Affinity Prediction.
  • May 10, 2025
  • Bioengineering (Basel, Switzerland)
  • Arjun Kaneriya + 5 more

Accurately predicting protein-ligand binding affinity is an important step in the drug discovery process. Deep learning (DL) methods have improved binding affinity prediction by using diverse categories of molecular data. However, many models rely heavily on interaction and sequence data, which impedes proper learning and limits performance in de novo applications. To address these limitations, we developed a novel graph neural network model, called StructureNet (structure-based graph neural network), to predict protein-ligand binding affinity. StructureNet improves existing DL methods by focusing entirely on structural descriptors to mitigate data memorization issues introduced by sequence and interaction data. StructureNet represents the protein and ligand structures as graphs, which are processed using a GNN-based ensemble deep learning model. StructureNet achieved a PCC of 0.68 and an AUC of 0.75 on the PDBBind v.2020 Refined Set, outperforming similar structure-based models. External validation on the DUDE-Z dataset showed that StructureNet can effectively distinguish between active and decoy ligands. Further testing on a small subset of well-known drugs indicates that StructureNet has high potential for rapid virtual screening applications. We also hybridized StructureNet with interaction- and sequence-based models to investigate their impact on testing accuracy and found minimal difference (0.01 PCC) between merged models and StructureNet as a standalone model. An ablation study found that geometric descriptors were the key drivers of model performance, with their removal leading to a PCC decrease of over 15.7%. Lastly, we tested StructureNet on ensembles of binding complex conformers generated using molecular dynamics (MD) simulations and found that incorporating multiple conformations of the same complex often improves model accuracy by capturing binding site flexibility. Overall, the results show that structural data alone are sufficient for binding affinity predictions and can address pattern recognition challenges introduced by sequence and interaction features. Additionally, structural representations of protein-ligand complexes can be considerably improved using geometric and topological descriptors. We made StructureNet GUI interface freely available online.

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.artmed.2010.05.003
Quantitative prediction of MHC-II binding affinity using particle swarm optimization
  • Jun 11, 2010
  • Artificial Intelligence in Medicine
  • Wen Zhang + 2 more

Quantitative prediction of MHC-II binding affinity using particle swarm optimization

  • Research Article
  • Cite Count Icon 7
  • 10.1002/prot.24366
Using the concept of transient complex for affinity predictions in CAPRI rounds 20–27 and beyond
  • Sep 14, 2013
  • Proteins: Structure, Function, and Bioinformatics
  • Sanbo Qin + 1 more

Predictions of protein-protein binders and binding affinities have traditionally focused on features pertaining to the native complexes. In developing a computational method for predicting protein-protein association rate constants, we introduced the concept of transient complex after mapping the interaction energy surface. The transient complex is located at the outer boundary of the bound-state energy well, having near-native separation and relative orientation between the subunits but not yet formed most of the short-range native interactions. We found that the width of the binding funnel and the electrostatic interaction energy of the transient complex are among the features predictive of binders and binding affinities. These ideas were very promising for the five affinity-related targets (T43-45, 55, and 56) of CAPRI rounds 20-27. For T43, we ranked the single crystallographic complex as number 1 and were one of only two groups that clearly identified that complex as a true binder; for T44, we ranked the only design with measurable binding affinity as number 4. For the nine docking targets, continuing on our success in previous CAPRI rounds, we produced 10 medium-quality models for T47 and acceptable models for T48 and T49. We conclude that the interaction energy landscape and the transient complex in particular will complement existing features in leading to better prediction of binding affinities.

  • Research Article
  • Cite Count Icon 12
  • 10.1007/s11030-008-9069-9
Structural features of diverse ligands influencing binding affinities to Estrogen α and Estrogen β receptors. Part I: molecular descriptors calculated from minimal energy conformation of isolated ligands
  • Aug 1, 2007
  • Molecular Diversity
  • Elena Boriani + 3 more

We report a neural network modeling approach combined with genetic algorithm for prediction of experimental binding affinity to human Estrogen Receptor alpha and beta (ER-alpha and ER-beta) of a diverse set of chemicals. The counterpropagation artificial neural network is used as a modeling method. Structural features of ligands having the strongest influence to the binding affinities were investigated. The molecular descriptors have been selected in the variable selection procedure based on the genetic algorithm (GA). The 3D descriptors of molecular structures were calculated for the minimal energy conformation of isolated ligands. All the optimized models were tested by an internal and an external set of compounds. The models served for classification and prediction of binding affinities. The optimized models were 100% correct in the classification part, where the active molecules were separated from the inactive ones. The best predictive model of active molecules was assessed with the internal test set yielding the error in prediction RMS = 0.12, while the predictions for the external test set contain some outliers, which are ascribed to the incompatibility of individual compounds concerning the structural domain of our model. The influence of the receptor on the conformation of the ligands in the ligand-protein complex is described and discussed in the accompanying paper.

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.ijbiomac.2024.129490
PRA-Pred: Structure-based prediction of protein-RNA binding affinity
  • Jan 13, 2024
  • International Journal of Biological Macromolecules
  • K Harini + 2 more

PRA-Pred: Structure-based prediction of protein-RNA binding affinity

  • Research Article
  • Cite Count Icon 81
  • 10.1093/bioinformatics/btad049
CAPLA: improved prediction of protein-ligand binding affinity by a deep learning approach based on a cross-attention mechanism.
  • Jan 23, 2023
  • Bioinformatics (Oxford, England)
  • Zhi Jin + 7 more

Accurate and rapid prediction of protein-ligand binding affinity is a great challenge currently encountered in drug discovery. Recent advances have manifested a promising alternative in applying deep learning-based computational approaches for accurately quantifying binding affinity. The structure complementarity between protein-binding pocket and ligand has a great effect on the binding strength between a protein and a ligand, but most of existing deep learning approaches usually extracted the features of pocket and ligand by these two detached modules. In this work, a new deep learning approach based on the cross-attention mechanism named CAPLA was developed for improved prediction of protein-ligand binding affinity by learning features from sequence-level information of both protein and ligand. Specifically, CAPLA employs the cross-attention mechanism to capture the mutual effect of protein-binding pocket and ligand. We evaluated the performance of our proposed CAPLA on comprehensive benchmarking experiments on binding affinity prediction, demonstrating the superior performance of CAPLA over state-of-the-art baseline approaches. Moreover, we provided the interpretability for CAPLA to uncover critical functional residues that contribute most to the binding affinity through the analysis of the attention scores generated by the cross-attention mechanism. Consequently, these results indicate that CAPLA is an effective approach for binding affinity prediction and may contribute to useful help for further consequent applications. The source code of the method along with trained models is freely available at https://github.com/lennylv/CAPLA. Supplementary data are available at Bioinformatics online.

  • Research Article
  • Cite Count Icon 27
  • 10.1093/bioinformatics/btae155
Contrastive pre-training and 3D convolution neural network for RNA and small molecule binding affinity prediction.
  • Mar 20, 2024
  • Bioinformatics (Oxford, England)
  • Saisai Sun + 1 more

The diverse structures and functions inherent in RNAs present a wealth of potential drug targets. Some small molecules are anticipated to serve as leading compounds, providing guidance for the development of novel RNA-targeted therapeutics. Consequently, the determination of RNA-small molecule binding affinity is a critical undertaking in the landscape of RNA-targeted drug discovery and development. Nevertheless, to date, only one computational method for RNA-small molecule binding affinity prediction has been proposed. The prediction of RNA-small molecule binding affinity remains a significant challenge. The development of a computational model is deemed essential to effectively extract relevant features and predict RNA-small molecule binding affinity accurately. In this study, we introduced RLaffinity, a novel deep learning model designed for the prediction of RNA-small molecule binding affinity based on 3D structures. RLaffinity integrated information from RNA pockets and small molecules, utilizing a 3D convolutional neural network (3D-CNN) coupled with a contrastive learning-based self-supervised pre-training model. To the best of our knowledge, RLaffinity was the first deep learning based method for the prediction of RNA-small molecule binding affinity. Our experimental results exhibited RLaffinity's superior performance compared to baseline methods, revealed by all metrics. The efficacy of RLaffinity underscores the capability of 3D-CNN to accurately extract both global pocket information and local neighbor nucleotide information within RNAs. Notably, the integration of a self-supervised pre-training model significantly enhanced predictive performance. Ultimately, RLaffinity was also proved as a potential tool for RNA-targeted drugs virtual screening. https://github.com/SaisaiSun/RLaffinity.

  • Research Article
  • Cite Count Icon 69
  • 10.1002/bip.21091
Modeling and prediction of binding affinities between the human amphiphysin SH3 domain and its peptide ligands using genetic algorithm‐Gaussian processes
  • Jan 1, 2008
  • Peptide Science
  • Peng Zhou + 3 more

In this article, we discuss the application of the Gaussian process (GP) and other statistical methods (PLS, ANN, and SVM) for the modeling and prediction of binding affinities between the human amphiphysin SH3 domain and its peptide ligands. Divided physicochemical property scores of amino acids, involving significant hydrogen bond, electronic, hydrophobic, and steric properties, was used to characterize the peptide structures, and quantitative structure-affinity relationship models were then constructed by PLS, ANN, SVM, and GP coupled with genetic algorithm-variable selection. The results show that: (i) since the significant flexibility and high complexity possessed in polypeptide structures, linear PLS method was incapable of fulfilling a satisfying behavior on SH3 domain binding peptide dataset; (ii) the overfitting involved in training process has decreased the predictive power of ANN model to some extent; (iii) both SVM and GP have a good performance for SH3 domain binding peptide dataset. Moreover, by combining linear and nonlinear terms in the covariance function, the GP is capable of handling linear and nonlinear-hybrid relationship, and which thus obtained a more stable and predictable model than SVM. Analyses of GP models showed that diversified properties contribute remarkable effect to the interactions between the SH3 domain and the peptides. Particularly, steric property and hydrophobicity of P(2), electronic property of P(0), and electronic property and hydrogen bond property of P(-3) in decapeptide (P(4)P(3)P(2)P(1)P(0)P(-1)P(-2)P(-3)P(-4)P(-5)) significantly contribute to the binding affinities of SH3 domain-peptide interactions.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1186/s12859-022-05107-w
Improved compound–protein interaction site and binding affinity prediction using self-supervised protein embeddings
  • Dec 16, 2022
  • BMC Bioinformatics
  • Jialin Wu + 3 more

BackgroundCompound–protein interaction site and binding affinity predictions are crucial for drug discovery and drug design. In recent years, many deep learning-based methods have been proposed for predications related to compound–protein interaction. For protein inputs, how to make use of protein primary sequence and tertiary structure information has impact on prediction results.ResultsIn this study, we propose a deep learning model based on a multi-objective neural network, which involves a multi-objective neural network for compound–protein interaction site and binding affinity prediction. We used several kinds of self-supervised protein embeddings to enrich our protein inputs and used convolutional neural networks to extract features from them. Our results demonstrate that our model had improvements in terms of interaction site prediction and affinity prediction compared to previous models. In a case study, our model could better predict binding sites, which also showed its effectiveness.ConclusionThese results suggest that our model could be a helpful tool for compound–protein related predictions.

  • Research Article
  • Cite Count Icon 6
  • 10.1002/prot.26827
Protein-Ligand Structure and Affinity Prediction in CASP16 Using a Geometric Deep Learning Ensemble and Flow Matching.
  • Apr 8, 2025
  • Proteins
  • Alex Morehead + 4 more

Predicting the structure of ligands bound to proteins is a foundational problem in modern biotechnology and drug discovery, yet little is known about how to combine the predictions of protein-ligand structure (poses) produced by the latest deep learning methods to identify the best poses and how to accurately estimate the binding affinity between a protein target and a list of ligand candidates. Further, a blind benchmarking and assessment of protein-ligand structure and binding affinity prediction is necessary to ensure it generalizes well to new settings. Towards this end, we introduce MULTICOM_ligand, a deep learning-based protein-ligand structure and binding affinity prediction ensemble featuring structural consensus ranking for unsupervised pose ranking and a new deep generative flow matching model for joint structure and binding affinity prediction. Notably, MULTICOM_ligand ranked among the top-5 ligand prediction methods in both protein-ligand structure prediction and binding affinity prediction in the 16th Critical Assessment of Techniques for Structure Prediction (CASP16), demonstrating its efficacy and utility for real-world drug discovery efforts. The source code for MULTICOM_ligand is freely available on GitHub.

  • Research Article
  • Cite Count Icon 19
  • 10.1007/s10822-011-9529-7
Exhaustive search and solvated interaction energy (SIE) for virtual screening and affinity prediction
  • Dec 25, 2011
  • Journal of Computer-Aided Molecular Design
  • Traian Sulea + 2 more

We carried out a prospective evaluation of the utility of the SIE (solvation interaction energy) scoring function for virtual screening and binding affinity prediction. Since experimental structures of the complexes were not provided, this was an exercise in virtual docking as well. We used our exhaustive docking program, Wilma, to provide high-quality poses that were rescored using SIE to provide binding affinity predictions. We also tested the combination of SIE with our latest solvation model, first shell of hydration (FiSH), which captures some of the discrete properties of water within a continuum model. We achieved good enrichment in virtual screening of fragments against trypsin, with an area under the curve of about 0.7 for the receiver operating characteristic curve. Moreover, the early enrichment performance was quite good with 50% of true actives recovered with a 15% false positive rate in a prospective calculation and with a 3% false positive rate in a retrospective application of SIE with FiSH. Binding affinity predictions for both trypsin and host-guest complexes were generally within 2 kcal/mol of the experimental values. However, the rank ordering of affinities differing by 2 kcal/mol or less was not well predicted. On the other hand, it was encouraging that the incorporation of a more sophisticated solvation model into SIE resulted in better discrimination of true binders from binders. This suggests that the inclusion of proper Physics in our models is a fruitful strategy for improving the reliability of our binding affinity predictions.

  • Research Article
  • Cite Count Icon 66
  • 10.1186/s12859-016-1169-4
Correcting the impact of docking pose generation error on binding affinity prediction.
  • Sep 1, 2016
  • BMC Bioinformatics
  • Hongjian Li + 3 more

BackgroundPose generation error is usually quantified as the difference between the geometry of the pose generated by the docking software and that of the same molecule co-crystallised with the considered protein. Surprisingly, the impact of this error on binding affinity prediction is yet to be systematically analysed across diverse protein-ligand complexes.ResultsAgainst commonly-held views, we have found that pose generation error has generally a small impact on the accuracy of binding affinity prediction. This is also true for large pose generation errors and it is not only observed with machine-learning scoring functions, but also with classical scoring functions such as AutoDock Vina. Furthermore, we propose a procedure to correct a substantial part of this error which consists of calibrating the scoring functions with re-docked, rather than co-crystallised, poses. In this way, the relationship between Vina-generated protein-ligand poses and their binding affinities is directly learned. As a result, test set performance after this error-correcting procedure is much closer to that of predicting the binding affinity in the absence of pose generation error (i.e. on crystal structures). We evaluated several strategies, obtaining better results for those using a single docked pose per ligand than those using multiple docked poses per ligand.ConclusionsBinding affinity prediction is often carried out on the docked pose of a known binder rather than its co-crystallised pose. Our results suggest than pose generation error is in general far less damaging for binding affinity prediction than it is currently believed. Another contribution of our study is the proposal of a procedure that largely corrects for this error. The resulting machine-learning scoring function is freely available at http://istar.cse.cuhk.edu.hk/rf-score-4.tgz and http://ballester.marseille.inserm.fr/rf-score-4.tgz.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1169-4) contains supplementary material, which is available to authorized users.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant