Abstract

We present three simple, yet effective data mining techniques for lazy structure-activity relationships (SARs) of noncongeneric compounds. In lazy SARs, classifications are particularly tailored for each test compound. Therefore, it is possible to make the most of the structure of a test compound. In our case, we derive its substructures and use them to determine similar structures. To obtain a well-balanced and representative set of structural descriptors, we enrich this set by strongly activating or deactivating fragments from the training set and subsequently remove redundant fragments. Finally, we perform k-Nearest Neighbor classification for several values of k and take a vote among the resulting predictions. These techniques (enrichment, removing redundancy, and voting) are integrated into the system iSAR (instance-based structure-activity relationships) and tested individually to show the relative contribution to the system's performance. Experiments on three data sets indicate that this simple and lightweight approach performs at least on the same level as other, more complex approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.