Abstract

In the preset report, for the first time, support vector machine (SVM), artificial neural network (ANN), Bayesian networks (BNs), k-nearest neighbor (k-NN) are applied and compared on two "in-house" datasets to describe the tyrosinase inhibitory activity from the molecular structure. The data set Data I is used for the identification of tyrosinase inhibitors (TIs) including 701 active and 728 inactive compounds. Data II consists of active chemicals for potency estimation of TIs. The 2D TOMOCOMD-CARDD atom-based quadratic indices are used as molecular descriptors. The derived models show rather encouraging results with the areas under the Receiver Operating Characteristic (AURC) curve in the test set above 0.943 and 0.846 for the Data I and Data II, respectively. Multiple comparison tests are carried out to compare the performance of the models and reveal the improvement of machine learning (ML) techniques with respect to statistical ones (see Chemometr. Intell. Lab. Syst. 2010, 104, 249). In some cases, these ameliorations are statistically significant. The tests also demostrate that k-NN, despite being a rather simple approach, presents the best behavior in both data. The obtained results suggest that the ML-based models could help to improve the virtual screening procedures and the confluence of these different techniques can increase the practicality of data mining procedures of chemical databases for the discovery of novel TIs as possible depigmenting agents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call