Abstract

Current ligand-based machine learning methods in virtual screening rely heavily on molecular fingerprinting for preprocessing, i.e., explicit description of ligands’ structural and physicochemical properties in a vectorized form. Of particular importance to current methods are the extent to which molecular fingerprints describe a particular ligand and what metric sufficiently captures similarity among ligands. In this work, we propose and evaluate methods that do not require explicit feature vectorization through fingerprinting, but, instead, provide implicit descriptors based only on other known assays. Our methods are based upon well known collaborative filtering algorithms used in recommendation systems. Our implicit descriptor method does not require any fingerprint similarity search, which makes the method free of the bias arising from the empirical nature of the fingerprint models. We show that implicit methods significantly outperform traditional machine learning methods, and the main strengths of implicit methods are their resilience to target-ligand sparsity and high potential for spotting promiscuous ligands.

Highlights

  • Virtual screening is an automated computational method of filtering candidate ligands based upon their inferred relationship with a given target

  • In addition we use two popular classification algorithms that are often employed in virtual screening applications, Random Forests (RF) and K-Nearest-Neighbors (KNN) to comparatively asses the performance of collaborative filtering

  • Traditional virtual screening methods in cheminformatics have historically relied on molecular modeling of explicit ligand and protein features

Read more

Summary

Introduction

Virtual screening is an automated computational method of filtering candidate ligands based upon their inferred relationship with a given target. The cost saving advantages of virtual screening must be reconciled with their ability to accurately find ligands with desired properties from relatively few examples. This is especially true for methods that employ traditional machine learning algorithms to predict binding affinity. The ability of traditional machine learning algorithms to effectively predict binding affinities against a specific target depends on the number of ligands assayed for that target. This is problematic for traditional machine learning algorithms because they generally require many training examples before they can predict the outcome reliably.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.