Weighted Tanimoto Extreme Learning Machine with Case Study in Drug Discovery

Wojciech Marian Czarnecki

doi:10.1109/mci.2015.2437312

Abstract

Machine learning methods are becoming more and more popular in the field of computer-aided drug design. The specific data characteristic, including sparse, binary representation as well as noisy, imbalanced datasets, presents a challenging binary classification problem. Currently, two of the most successful models in such tasks are the Support Vector Machine (SVM) and Random Forest (RF). In this paper, we introduce a Weighted Tanimoto Extreme Learning Machine (T-WELM), an extremely simple and fast method for predicting chemical compound biological activity and possibly other data with discrete, binary representation. We show some theoretical properties of the proposed model including the ability to learn arbitrary sets of examples. Further analysis shows numerous advantages of T-WELM over SVMs, RFs and traditional Extreme Learning Machines (ELM) in this particular task. Experiments performed on 40 large datasets of thousands of chemical compounds show that T-WELMs achieve much better classification results and are at the same time faster in terms of both training time and further classification than both ELM models and other state-of-the-art methods in the field.

Full Text