Abstract

Two-dimensional (2D) chemical fingerprints are widely used as binary features for the quantification of structural similarity of chemical compounds, which is an important step in similarity-based virtual screening (VS). Here, using an eigenvalue-based entropy approach, we identified 2D fingerprints with little to no contribution to shaping the eigenvalue distribution of the feature matrix as related ones and examined the degree to which these related 2D fingerprints influenced molecular similarity scores calculated with the Tanimoto coefficient. Our analysis identified many related fingerprints in publicly available fingerprint schemes and showed that their presence in the feature set could have substantial effects on the similarity scores and bias the outcome of molecular similarity analysis. Our results have implication in the optimal selection of 2D fingerprints for compound similarity analysis and the identification of potential hits for compounds with target biological activity in VS.

Highlights

  • Virtual screening (VS) is a computational approach that is widely used as a cost-effective alternative to the traditional high-throughput screening for the selection of initial hits in a search for drugs with a given biological activity [1, 2]

  • Among the most commonly used fingerprint schemes for similarity quantification is molecular access system (MACCS) [6], which was reported to cover many useful 2D features for virtual screening [7]. While these predefined fingerprint dictionaries are easy to use, previous studies demonstrated that the selection of relevant 2D fingerprints from the original set resulted in better performance [8–10]. These feature selection methods typically focus on supervised machine learning settings in which to select a subset of relevant 2D fingerprints that intend to enhance the generality to Kuwahara and Gao J Cheminform (2021) 13:27 discriminate chemical compounds with a given biological activity against those without

  • In this study, we defined related fingerprints to be those that do not contribute to the shape of the eigenvalue distribution of the original fingerprint feature matrix and are thought to possess a high degree of a b contribution to the intersection query drug

Read more

Summary

Introduction

Virtual screening (VS) is a computational approach that is widely used as a cost-effective alternative to the traditional high-throughput screening for the selection of initial hits in a search for drugs with a given biological activity [1, 2]. Among the most commonly used fingerprint schemes for similarity quantification is molecular access system (MACCS) [6], which was reported to cover many useful 2D features for virtual screening [7] While these predefined fingerprint dictionaries are easy to use, previous studies demonstrated that the selection of relevant 2D fingerprints from the original set resulted in better performance [8–10]. These feature selection methods typically focus on supervised machine learning settings in which to select a subset of relevant 2D fingerprints that intend to enhance the generality to Kuwahara and Gao J Cheminform (2021) 13:27 discriminate chemical compounds with a given biological activity against those without. That is, had the number of target bioactive compounds been large enough to begin with, a pipeline to discover more of the same would not have probably warranted a large cost of investment

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.