Abstract
A graph-based genetic algorithm (GA) is used to identify molecules (ligands) with high absolute docking scores as estimated by the Glide software package, starting from randomly chosen molecules from the ZINC database, for four different targets:Bacillus subtilischorismate mutase (CM), humanβ2-adrenergic G protein-coupled receptor (β2AR), the DDR1 kinase domain (DDR1), andβ-cyclodextrin (BCD). By the combined use of functional group filters and a score modifier based on a heuristic synthetic accessibility (SA) score our approach identifies between ca 500 and 6,000 structurally diverse molecules with scores better than known binders by screening a total of 400,000 molecules starting from 8,000 randomly selected molecules from the ZINC database. Screening 250,000 molecules from the ZINC database identifies significantly more molecules with better docking scores than known binders, with the exception of CM, where the conventional screening approach only identifies 60 compounds compared to 511 with GA+Filter+SA. In the case ofβ2AR and DDR1, the GA+Filter+SA approach finds significantly more molecules with docking scores lower than −9.0 and −10.0. The GA+Filters+SA docking methodology is thus effective in generating a large and diverse set of synthetically accessible molecules with very good docking scores for a particular target. An early incarnation of the GA+Filter+SA approach was used to identify potential binders to the COVID-19 main protease and submitted to the early stages of the COVID Moonshot project, a crowd-sourced initiative to accelerate the development of a COVID antiviral.
Highlights
Docking of molecules to protein targets is an important part of computer aided drug discovery (Kitchen et al, 2004)
In this paper we show that a non-fragment based genetic algorithm (GA) (Jensen, 2019) can be used to find more synthetically accessible molecules with good Glide (Friesner et al, 2004; Halgren et al, 2004) docking scores compared to conventional high throughput virtual screening (HTVS) of libraries
As noted by Gao & Coley (2020) and Brown et al (2019) generative models in general and GAs in particular often generate molecules with known chemically unstable bonds or molecules that are difficult to synthesise. We address this issue in three ways: we use Walters rd_filters code (following Brown et al (2019)), a score modifier suggested by Gao & Coley (2020) based on a heuristic synthetic accessibility (SA) score (Ertl & Schuffenhauer, 2009), and a combination of the two
Summary
Docking of molecules to protein targets is an important part of computer aided drug discovery (Kitchen et al, 2004). Recent studies have show that such HTVS of hundreds of millions (Lyu et al, 2019) or even billions of molecules (Grebner et al, 2019) are possible. Such large numbers pale in comparison with the estimated 1060 small molecules that make up chemical space. Most work in this area as it relates to drug discovery have used evolutionary search algorithms to address this problem and such methods have been applied to docking.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.