Abstract
Structural similarity search among small molecules is a standard tool used in molecular classification and in-silico drug discovery. The effectiveness of this general approach depends on how well the following problems are addressed. The notion of similarity should be chosen for providing the highest level of discrimination of compounds with respect to the bioactivity of interest. The data structure for performing search should be very efficient as the molecular databases of interest include several millions of compounds. In this paper we summarize the recent applications of k -nearest-neighbor search method for small molecule classification. The k -nn classification of small molecules is based on selecting the most relevant set of chemical descriptors which are then compared under standard Minkowski distance L p . Here we describe how to computationally design the optimal weighted Minkowski distance wL p for maximizing the discrimination between active and inactive compounds wrt bioactivities of interest. k -nn classification requires fast similarity search for predicting bioactivity of a new molecule. We then focus on construction of pruning based k -nn search data structures for any wL p distance that minimizes similarity search time. The accuracy achieved by k -nn classifier is better than the alternative LDA and MLR approaches and is comparable to the ANN methods. In terms of running time, k -nn classifier is considerably faster than the ANN approach especially when large data sets are used. Furthermore, k -nn classifier is capable of quantification of the level of bioactivity rather than returning a binary decision and can bring more insight to the nature of the activity via eliminating unrelated descriptors of the compounds with respect to the activity in question.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: ACM SIGKDD Explorations Newsletter
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.