Abstract
BackgroundDrug discovery is a multi-stage process that comprises two costly major steps: pre-clinical research and clinical trials. Among its stages, lead optimization easily consumes more than half of the pre-clinical budget. We propose a combined machine learning and molecular modeling approach that partially automates lead optimization workflow in silico, providing suggestions for modification hot spots.ResultsThe initial data collection is achieved with physics-based molecular dynamics simulation. Contact matrices are calculated as the preliminary features extracted from the simulations. To take advantage of the temporal information from the simulations, we enhanced contact matrices data with temporal dynamism representation, which are then modeled with unsupervised convolutional variational autoencoder (CVAE). Finally, conventional and CVAE-based clustering methods are compared with metrics to rank the submolecular structures and propose potential candidates for lead optimization.ConclusionWith no need for extensive structure-activity data, our method provides new hints for drug modification hotspots which can be used to improve drug potency and reduce the lead optimization time. It can potentially become a valuable tool for medicinal chemists.
Highlights
At a time of global health crisis, drug discovery is of utter importance to bring the society back to its order
We propose a computational method, coined Clustered Atom Subtypes aidEd Lead Optimization (CASTELO), that identifies modifiable submolecular moieties in a lead molecule to narrow down the substitution sites to a few possibilities
convolutional variational autoencoder (CVAE) method was adopted to compress the dynamism tensors into latent space before the data clustering with HDBSCAN
Summary
At a time of global health crisis, drug discovery is of utter importance to bring the society back to its order. Despite the growing research and development expenditure every year [1, 2], the yearly FDA-approval of drugs has mostly stalled since 1993 [3]. There were a total of 3437 FDA approved small-molecule and large-molecule drugs or therapeutics in 2018 [4], with a yearly addition of only ∼ 1.2% (2014–2018 average). Drug discovery is a multi-stage process that comprises two costly major steps: pre-clinical research and clinical trials. Lead optimization consumes more than half of the pre-clinical budget. We propose a combined machine learning and molecular modeling approach that partially automates lead optimization workflow in silico, providing suggestions for modification hot spots
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.