Abstract

The advent of computational drug discovery holds the promise of significantly reducing the effort of experimentalists, along with monetary cost. More generally, predicting the binding of small organic molecules to biological macromolecules has far-reaching implications for a range of problems, including metabolomics. However, problems such as predicting the bound structure of a protein-ligand complex along with its affinity have proven to be an enormous challenge. In recent years, machine learning-based methods have proven to be more accurate than older methods, many based on simple linear regression. Nonetheless, there remains room for improvement, as these methods are often trained on a small set of features, with a single functional form for any given physical effect, and often with little mention of the rationale behind choosing one functional form over another. Moreover, it is not entirely clear why one machine learning method is favored over another. In this work, we endeavor to undertake a comprehensive effort towards developing high-accuracy, machine-learned scoring functions, systematically investigating the effects of machine learning method and choice of features, and, when possible, providing insights into the relevant physics using methods that assess feature importance. Here, we show synergism among disparate features, yielding adjusted R2 with experimental binding affinities of up to 0.871 on an independent test set and enrichment for native bound structures of up to 0.913. When purely physical terms that model enthalpic and entropic effects are used in the training, we use feature importance assessments to probe the relevant physics and hopefully guide future investigators working on this and other computational chemistry problems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.