Abstract

Similarity-search methods using molecular fingerprints are an important tool for ligand-based virtual screening. A huge variety of fingerprints exist and their performance, usually assessed in retrospective benchmarking studies using data sets with known actives and known or assumed inactives, depends largely on the validation data sets used and the similarity measure used. Comparing new methods to existing ones in any systematic way is rather difficult due to the lack of standard data sets and evaluation procedures. Here, we present a standard platform for the benchmarking of 2D fingerprints. The open-source platform contains all source code, structural data for the actives and inactives used (drawn from three publicly available collections of data sets), and lists of randomly selected query molecules to be used for statistically valid comparisons of methods. This allows the exact reproduction and comparison of results for future studies. The results for 12 standard fingerprints together with two simple baseline fingerprints assessed by seven evaluation methods are shown together with the correlations between methods. High correlations were found between the 12 fingerprints and a careful statistical analysis showed that only the two baseline fingerprints were different from the others in a statistically significant way. High correlations were also found between six of the seven evaluation methods, indicating that despite their seeming differences, many of these methods are similar to each other.

Highlights

  • The concept of molecular similarity is often used in the context of ligand-based virtual screening (VS) to use known actives to find new molecules to test [1]

  • Using the benchmarking platform, the performance of 14 2D fingerprints covering dictionary-based, path-based and circular fingerprints was assessed over 88 targets from three publicly available collections of data sets

  • The platform uses the open-source cheminformatics toolkit RDKit to calculate fingerprints and similarities, but through the three-stage design data generated by other sources can be fed in at the validation or analysis stages

Read more

Summary

Introduction

The concept of molecular similarity is often used in the context of ligand-based virtual screening (VS) to use known actives to find new molecules to test [1]. The choice of molecular description to calculate the similarity is not trivial and can vary depending on the compound selection and/or target [5,6,7]. A variety of descriptors exist which can be divided into two large groups depending if they consider only the 2D structure (topology) of a molecule or if they include 3D information. A standard and computationally efficient abstract representation is molecular fingerprints [8], where structural features are represented by either bits in a bit string or counts in a count vector.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.