Sort by
De novo drug design through gradient-based regularized search in information-theoretically controlled latent space

Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.

Open Access Just Published
Relevant
Development of human lactate dehydrogenase a inhibitors: high-throughput screening, molecular dynamics simulation and enzyme activity assay.

Lactate dehydrogenase A (LDHA) is highly expressed in many tumor cells and promotes the conversion of pyruvate to lactic acid in the glucose pathway, providing energy and synthetic precursors for rapid proliferation of tumor cells. Therefore, inhibition of LDHA has become a widely concerned tumor treatment strategy. However, the research and development of highly efficient and low toxic LDHA small molecule inhibitors still faces challenges. To discover potential inhibitors against LDHA, virtual screening based on molecular docking techniques was performed from Specs database of more than 260,000 compounds and Chemdiv-smart database of more than 1,000 compounds. Through molecular dynamics (MD) simulation studies, we identified 12 potential LDHA inhibitors, all of which can stably bind to human LDHA protein and form multiple interactions with its active central residues. In order to verify the inhibitory activities of these compounds, we established an enzyme activity assay system and measured their inhibitory effects on recombinant human LDHA. The results showed that Compound 6 could inhibit the catalytic effect of LDHA on pyruvate in a dose-dependent manner with an EC50 value of 14.54 ± 0.83 µM. Further in vitro experiments showed that Compound 6 could significantly inhibit the proliferation of various tumor cell lines such as pancreatic cancer cells and lung cancer cells, reduce intracellular lactic acid content and increase intracellular reactive oxygen species (ROS) level. In summary, through virtual screening and in vitro validation, we found that Compound 6 is a small molecule inhibitor for LDHA, providing a good lead compound for the research and development of LDHA related targeted anti-tumor drugs.

Relevant
Development of QSARs for cysteine-containing di- and tripeptides with antioxidant activity:influence of the cysteine position.

Antioxidants agents play an essential role in the food industry for improving the oxidative stability of food products. In the last years, the search for new natural antioxidants has increased due to the potential high toxicity of chemical additives. Therefore, the synthesis and evaluation of the antioxidant activity in peptides is a field of current research. In this study, we performed a Quantitative Structure Activity Relationship analysis (QSAR) of cysteine-containing 19 dipeptides and 19 tripeptides. The main objective is to bring information on the relationship between the structure of peptides and their antioxidant activity. For this purpose, 1D and 2D molecular descriptors were calculated using the PaDEL software, which provides information about the structure, shape, size, charge, polarity, solubility and other aspects of the compounds. Different QSAR model for di- and tripeptides were developed. The statistic parameters for di-peptides model (R2train = 0.947 and R2test = 0.804) and for tripeptide models (R2train = 0.923 and R2test = 0.847) indicate that the generated models have high predictive capacity. Then, the influence of the cysteine position was analyzed predicting the antioxidant activity for new di- and tripeptides, and comparing them with glutathione. In dipeptides, excepting SC, TC and VC, the activity increases when cysteine is at the N-terminal position. For tripeptides, we observed a notable increase in activity when cysteine is placed in the N-terminal position.

Relevant
From mundane to surprising nonadditivity: drivers and impact on ML models.

Nonadditivity (NA) in Structure-Activity and Structure-Property Relationship (SAR) data is a rare but very information rich phenomenon. It can indicate conformational flexibility, structural rearrangements, and errors in assay results and structural assignment. While purely ligand-based conformational causes of NA are rather well understood and mundane, other factors are less so and cause surprising NA that has a huge influence on SAR analysis and ML model performance. We here report a systematic analysis across a wide range of properties (20 on-target biological activities and 4 physicochemical ADME-related properties) to understand the frequency of various different phenomena that may lead to NA. A set of novel descriptors were developed to characterize double transformation cycles and identify trends in NA. Double transformation cycles were classified into "surprising" and "mundane" categories, with the majority being classed as mundane. We also examined commonalities among surprising cycles, finding LogP differences to have the most significant impact on NA. A distinct behavior of NA for on-target sets compared to ADME sets was observed. Finally, we show that machine learning models struggle with highly nonadditive data, indicating that a better understanding of NA is an important future research direction.

Relevant
MDFit: automated molecular simulations workflow enables high throughput assessment of ligands-protein dynamics.

Molecular dynamics (MD) simulation is a powerful tool for characterizing ligand-protein conformational dynamics and offers significant advantages over docking and other rigid structure-based computational methods. However, setting up, running, and analyzing MD simulations continues to be a multi-step process making it cumbersome to assess a library of ligands in a protein binding pocket using MD. We present an automated workflow that streamlines setting up, running, and analyzing Desmond MD simulations for protein-ligand complexes using machine learning (ML) models. The workflow takes a library of pre-docked ligands and a prepared protein structure as input, sets up and runs MD with each protein-ligand complex, and generates simulation fingerprints for each ligand. Simulation fingerprints (SimFP) capture protein-ligand compatibility, including stability of different ligand-pocket interactions and other useful metrics that enable easy rank-ordering of the ligand library for pocket optimization. SimFPs from a ligand library are used to build & deploy ML models that predict binding assay outcomes and automatically infer important interactions. Unlike relative free-energy methods that are constrained to assess ligands with high chemical similarity, ML models based on SimFPs can accommodate diverse ligand sets. We present two case studies on how SimFP helps delineate structure-activity relationship (SAR) trends and explain potency differences across matched-molecular pairs of (1) cyclic peptides targeting PD-L1 and (2) small molecule inhibitors targeting CDK9.

Relevant
Structural impacts of two disease-linked ADAR1 mutants: a molecular dynamics study.

Adenosine deaminases acting on RNA (ADARs) are pivotal RNA-editing enzymes responsible for converting adenosine to inosine within double-stranded RNA (dsRNA). Dysregulation of ADAR1 editing activity, often arising from genetic mutations, has been linked to elevated interferon levels and the onset of autoinflammatory diseases. However, understanding the molecular underpinnings of this dysregulation is impeded by the lack of an experimentally determined structure for the ADAR1 deaminase domain. In this computational study, we utilized homology modeling and the AlphaFold2 to construct structural models of the ADAR1 deaminase domain in wild-type and two pathogenic variants, R892H and Y1112F, to decipher the structural impact on the reduced deaminase activity. Our findings illuminate the critical role of structural complementarity between the ADAR1 deaminase domain and dsRNA in enzyme-substrate recognition. That is, the relative position of E1008 and K1120 must be maintained so that they can insert into the minor and major grooves of the substrate dsRNA, respectively, facilitating the flipping-out of adenosine to be accommodated within a cavity surrounding E912. Both amino acid replacements studied, R892H at the orthosteric site and Y1112F at the allosteric site, alter K1120 position and ultimately hinder substrate RNA binding.

Relevant
User-centric design of a 3D search interface for protein-ligand complexes

In this work, we present the frontend of GeoMine and showcase its application, focusing on the new features of its latest version. GeoMine is a search engine for ligand-bound and predicted empty binding sites in the Protein Data Bank. In addition to its basic text-based search functionalities, GeoMine offers a geometric query type for searching binding sites with a specific relative spatial arrangement of chemical features such as heavy atoms and intermolecular interactions. In contrast to a text search that requires simple and easy-to-formulate user input, a 3D input is more complex, and its specification can be challenging for users. GeoMine’s new version aims to address this issue from the graphical user interface perspective by introducing an additional visualization concept and a new query template type. In its latest version, GeoMine extends its query-building capabilities primarily through input formulation in 2D. The 2D editor is fully synchronized with GeoMine’s 3D editor and provides the same functionality. It enables template-free query generation and template-based query selection directly in 2D pose diagrams. In addition, the query generation with the 3D editor now supports predicted empty binding sites for AlphaFold structures as query templates. GeoMine is freely accessible on the ProteinsPlus web server (https://proteins.plus).

Open Access
Relevant
Correlation of protein binding pocket properties with hits’ chemistries used in generation of ultra-large virtual libraries

Although the size of virtual libraries of synthesizable compounds is growing rapidly, we are still enumerating only tiny fractions of the drug-like chemical universe. Our capability to mine these newly generated libraries also lags their growth. That is why fragment-based approaches that utilize on-demand virtual combinatorial libraries are gaining popularity in drug discovery. These à la carte libraries utilize synthetic blocks found to be effective binders in parts of target protein pockets and a variety of reliable chemistries to connect them. There is, however, no data on the potential impact of the chemistries used for making on-demand libraries on the hit rates during virtual screening. There are also no rules to guide in the selection of these synthetic methods for production of custom libraries. We have used the SAVI (Synthetically Accessible Virtual Inventory) library, constructed using 53 reliable reaction types (transforms), to evaluate the impact of these chemistries on docking hit rates for 40 well-characterized protein pockets. The data shows that the virtual hit rates differ significantly for different chemistries with cross coupling reactions such as Sonogashira, Suzuki–Miyaura, Hiyama and Liebeskind–Srogl coupling producing the highest hit rates. Virtual hit rates appear to depend not only on the property of the formed chemical bond but also on the diversity of available building blocks and the scope of the reaction. The data identifies reactions that deserve wider use through increasing the number of corresponding building blocks and suggests the reactions that are more effective for pockets with certain physical and hydrogen bond-forming properties.

Open Access
Relevant
Reactivities of acrylamide warheads toward cysteine targets: a QM/ML approach to covalent inhibitor design.

Covalent inhibition offers many advantages over non-covalent inhibition, but covalent warhead reactivity must be carefully balanced to maintain potency while avoiding unwanted side effects. While warhead reactivities are commonly measured with assays, a computational model to predict warhead reactivities could be useful for several aspects of the covalent inhibitor design process. Studies have shown correlations between covalent warhead reactivities and quantum mechanic (QM) properties that describe important aspects of the covalent reaction mechanism. However, the models from these studies are often linear regression equations and can have limitations associated with their usage. Applications of machine learning (ML) models to predict covalent warhead reactivities with QM descriptors are not extensively seen in the literature. This study uses QM descriptors, calculated at different levels of theory, to train ML models to predict reactivities of covalent acrylamide warheads. The QM/ML models are compared with linear regression models built upon the same QM descriptors and with ML models trained on structure-based features like Morgan fingerprints and RDKit descriptors. Experiments show that the QM/ML models outperform the linear regression models and the structure-based ML models, and literature test sets demonstrate the power of the QM/ML models to predict reactivities of unseen acrylamide warhead scaffolds. Ultimately, these QM/ML models are effective, computationally feasible tools that can expedite the design of new covalent inhibitors.

Relevant
De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning

In recent years, generative machine learning algorithms have been successful in designing innovative drug-like molecules. SMILES is a sequence-like language used in most effective drug design models. Due to data’s sequential structure, models such as recurrent neural networks and transformers can design pharmacological compounds with optimized efficacy. Large language models have advanced recently, but their implications on drug design have not yet been explored. Although one study successfully pre-trained a large chemistry model (LCM), its application to specific tasks in drug discovery is unknown. In this study, the drug design task is modeled as a causal language modeling problem. Thus, the procedure of reward modeling, supervised fine-tuning, and proximal policy optimization was used to transfer the LCM to drug design, similar to Open AI’s ChatGPT and InstructGPT procedures. By combining the SMILES sequence with chemical descriptors, the novel efficacy evaluation model exceeded its performance compared to previous studies. After proximal policy optimization, the drug design model generated molecules with 99.2% having efficacy pIC50 > 7 towards the amyloid precursor protein, with 100% of the generated molecules being valid and novel. This demonstrated the applicability of LCMs in drug discovery, with benefits including less data consumption while fine-tuning. The applicability of LCMs to drug discovery opens the door for larger studies involving reinforcement-learning with human feedback, where chemists provide feedback to LCMs and generate higher-quality molecules. LCMs’ ability to design similar molecules from datasets paves the way for more accessible, non-patented alternatives to drug molecules.

Open Access
Relevant