Abstract

Key-based substructural fingerprints are an important element of computer-aided drug design techniques. The usefulness of the fingerprints in filtering compound databases is invaluable, as they allow for the quick rejection of molecules with a low probability of being active. However, this method is flawed, as it does not consider the connections between substructures. After changing the connections between particular chemical moieties, the fingerprint representation of the compound remains the same, which leads to difficulties in distinguishing between active and inactive compounds. In this study, we present a new method of compound representation—substructural connectivity fingerprints (SCFP), providing information not only about the presence of particular substructures in the molecule but also additional data on substructure connections. Such representation was analyzed by the recently developed methodology—extreme entropy machines (EEM). The SCFP can be a valuable addition to virtual screening tools, as it represents compound structure with greater detail and more specificity, allowing for more accurate classification.

Highlights

  • Modern drug discovery calls for more cost-efficient and effective methods of filtering the vast libraries of chemical compounds in the search for potential drugs

  • In this research we presented the substructural connectivity fingerprints (SCFP) as a new method of compound representation

  • The addition of intra-substructural connectivity data into the FP allows for the acquisition of more specific substructure patterns within compounds, which in turn enables classification algorithms to more accurately filter out inactive compounds that structurally resemble active ones

Read more

Summary

Introduction

Modern drug discovery calls for more cost-efficient and effective methods of filtering the vast libraries of chemical compounds in the search for potential drugs. The numerical methods of compound screening and selection, collectively called virtual screening (VS) [1], play a major role in the process of computer-aided drug design (CADD) [2]. The key-based substructural fingerprints (FPs) [3] are a popular method of compound representation used in early stages of a VS cascade. They are based on the occurrences of predefined chemical groups—“keys”, and are encoded as a bit string that can be analyzed using various algorithms, such as similarity searching, hierarchical clustering or activity-based discrimination tests using machine learning (ML) methods. There are several available key-based FPs, differing in the set of keys used for their generation, e.g., Klekota–Roth FP (KR) [4], MACCS FP [5], Substructure FP (SUB) [6], or CACTVS FP (or PubChem FP) [7]

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call