Abstract

In drug discovery, machine learning is widely used to classify molecules as active or inactive against a particular target. The vast majority of these methods (supervised learning) needs a training set of objects (molecules) to develop a decision rule that can be used to classify new entities (the test set) into one of the two mentioned classes [1]. A lot of studies, searching an optimal learning parameters and their impact on classification effectiveness were performed [2,3]. Unfortunately, there is no data showing the influence of actives/inactives ratio, used to model training, on the efficiency of new active compounds identification. Therefore, the main goal of this study was to examine the impact of changing the number of inactives in the training set with fixed amount of actives. For a given ratio, the inactives were randomly selected from ZINC database (10-times to prevent an overestimations error). This concept was verified on three different protein targets (i.e. 5-HT1A, HIV-1 protease and matrix metalloproteinase) and a set of algorithms (SMO, Naive Bayes, Ibk, J48 and Random Forest) implemented in WEKA package [4]. To compounds representation, two types of molecular fingerprints were used (MACCS and hashed fingerprint), to determine their possible impact on machine learning performance.

Highlights

  • In drug discovery, machine learning is widely used to classify molecules as active or inactive against a particular target

  • There is no data showing the influence of actives/inactives ratio, used to model training, on the efficiency of new active compounds identification

  • * Correspondence: kurczab@if-pan.krakow.pl 1Department of Medicinal Chemistry, Institute of Pharmacology Polish Academy of Sciences, Kraków, 31-343, Poland Full list of author information is available at the end of the article

Read more

Summary

Introduction

Machine learning is widely used to classify molecules as active or inactive against a particular target. A lot of studies, searching an optimal learning parameters and their impact on classification effectiveness were performed [2,3].

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.