Abstract

Fingerprints are bit string representations of molecular structure and properties and widely used tools to search databases for active molecules. It is well appreciated that molecular complexity and size effects lead to systematic errors in fingerprint similarity searching. For example, different studies have highlighted the caveats associated with preferential recognition of large compounds, irrespective of their activity, when complex molecules are used as templates for fingerprint calculations. In order to systematically study complexity relationships between reference and database molecules that are relevant for practical fingerprint similarity searching, we have designed sets of active molecules of increasing fingerprint bit density relative to average database compounds and potential hits and carried out systematic similarity search trials. We find that the more complex reference molecules are, the lower the search performance becomes. However, a major result has been that random deletion of bits that are set on in fingerprints of complex reference molecules generally improves compound recall, although these random bit density reductions also cause a loss in chemical information content. These results suggest a general search strategy for fingerprints that are sensitive to complexity effects when optimized active compounds are used as reference molecules.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.