Abstract

Many traditional quantitative structure-activity relationship (QSAR) models are based on correlation with high-dimensional, highly variable molecular features in their raw form, limiting their generalizing capabilities despite the use of large training sets. They also lack elements of causality and reasoning. With these issues in mind, we developed a method for learning higher-level abstract representations of the effects of the interactions between molecular features and biology. We named the representations as the reason vectors. They are composed of a series of computed activity of substructures obtained from stepwise reconstruction of the molecule. This representation is very different from fingerprints, which are composed of molecular features directly. These vectors capture reasons of bioactivity of chemicals (or absence thereof) in an abstract form, uncover causality in interactions between chemical features, and generalize beyond specific chemical classes or bioactivity. Reason vectors contain only a few key attributes and are much smaller than molecular fingerprints. They allow vague and conceptual similarity searches, less susceptible to failure on novel combinations of query molecule features and more likely to identify reasons of activity in chemical classes that are absent in training data. Reason vectors can be compared with each other and their activity can be computed by matching with vectors from molecules with known bioactivity. A single molecule produces as many reason vectors as heavy atoms in it, and a simple count of these vectors in a series of activity ranges is all what is needed to predict its bioactivity. Thus, the prediction method is devoid of gradient optimization or statistical fitting.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.