Abstract

BackgroundThe low success rate and high cost of drug discovery requires the development of new paradigms to identify molecules of therapeutic value. The Anatomical Therapeutic Chemical (ATC) Code System is a World Health Organization (WHO) proposed classification that assigns multi-level codes to compounds based on their therapeutic, pharmacological and chemical characteristics as well as the in-vivo sites(s) of activity. The ability to predict ATC codes of compounds can assist in creation of high-quality chemical libraries for drug screening and in applications such as drug repositioning. We propose a machine learning architecture called tiered learning for prediction of ATC codes that relies on the prediction results of the higher levels of the ATC code to simplify the predictions of the lower levels.ResultsThe proposed approach was validated using a number of compounds in both cross-validation and test setting. The validation experiments compared chemical descriptors, initialization methods and classification algorithms. The prediction accuracy obtained with tiered learning was found to be either comparable or better than that of established methods. Additionally, the experiments demonstrated the generalizability of the tiered learning architecture, in that its use was found to improve prediction rates for a majority of machine learning algorithms when compared to their stand-alone application.ConclusionThe basis of our approach lies in the observation that anatomical-therapeutic biological activity of certain types typically precludes activities of many other types. Thus, there exists a characteristic distribution of the ATC codes, which can be leveraged to limit the search-space of possible codes that can be ascribed at a particular level once the codes at the preceding levels are known. Tiered learning utilizes this observation to constrain the learning space for ATC codes at a particular level based on the ATC code at higher levels. This simplifies the prediction and allows for improved accuracy.

Highlights

  • The low success rate and high cost of drug discovery requires the development of new paradigms to identify molecules of therapeutic value

  • The tiered learning architecture Like the other works surveyed in the previous section, we too employ a supervised formulation to solve the Anatomical Therapeutic Chemical (ATC) code prediction problem

  • We selected compounds that were either approved drugs or had reached phase-III trials and had high quality information associated with them including information on their ATC code, targets, and method of action (MOA)

Read more

Summary

Introduction

The low success rate and high cost of drug discovery requires the development of new paradigms to identify molecules of therapeutic value. The ability to predict ATC codes of compounds can assist in creation of high-quality chemical libraries for drug screening and in applications such as drug repositioning. Drug discovery efforts typically start by screening a large number of compounds to identify “leads” which subsequently undergo optimization and in vivo test of efficacy and pharmacokinetics to identify candidates for clinical trials. Repositioning an existing drug to a novel pathology is an alluring, though limited, alternative to de novo drug design [2]. In both the above problems formulations, the ability to identify compounds that are therapeutically of interest vis-à-vis a particular pathology is critical

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call