Integrated data-driven and experimental approaches to accelerate lead optimization targeting SARS-CoV-2 main protease.

Rohith Anand Varikoti,Neeraj Kumar,Mowei Zhou,Kristoffer R. Brandvold,Agustin Kruel,Chathuri J. Kombala,Katherine J. Schultz

doi:10.1007/s10822-023-00509-1

Abstract

Identification of potential therapeutic candidates can be expedited by integrating computational modeling with domain aware machine learning (ML) models followed by experimental validation in an iterative manner. Generative deep learning models can generate thousands of new candidates, however, their physiochemical and biochemical properties are typically not fully optimized. Using our recently developed deep learning models and a scaffold as a starting point, we generated tens of thousands of compounds for SARS-CoV-2Mpro that preserve the core scaffold. We utilized and implemented several computational tools such as structural alert and toxicity analysis, high throughput virtual screening, ML-based 3D quantitative structure-activity relationships, multi-parameter optimization, and graph neural networks on generated candidates to predict biological activity and binding affinity in advance. As a result of these combined computational endeavors, eight promising candidates were singled out and put through experimental testing using Native Mass Spectrometry and FRET-based functional assays. Two of the tested compounds with quinazoline-2-thiol and acetylpiperidine core moieties showed IC[Formula: see text] values in the low micromolar range: [Formula: see text] [Formula: see text]M and 3.41±0.0015 [Formula: see text]M, respectively. Molecular dynamics simulations further highlight that binding of these compounds results in allosteric modulations within the chain B and the interface domains of the Mpro. Our integrated approach provides a platform for data driven lead optimization with rapid characterization and experimental validation in a closed loop that could be applied to other potential protein targets.

Full Text