Abstract
BackgroundThe World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. It would be ideal to be able to identify all substances that have one or more performance-enhancing pharmacological actions in an automated, fast and cost effective way. Here, we use experimental data derived from the ChEMBL database (~7,000,000 activity records for 1,300,000 compounds) to build a database model that takes into account both structure and experimental information, and use this database to predict both on-target and off-target interactions between these molecules and targets relevant to doping in sport.ResultsThe ChEMBL database was screened and eight well populated categories of activities (Ki, Kd, EC50, ED50, activity, potency, inhibition and IC50) were used for a rule-based filtering process to define the labels “active” or “inactive”. The “active” compounds for each of the ChEMBL families were thereby defined and these populated our bioactivity-based filtered families. A structure-based clustering step was subsequently performed in order to split families with more than one distinct chemical scaffold. This produced refined families, whose members share both a common chemical scaffold and bioactivity against a common target in ChEMBL.ConclusionsWe have used the Parzen-Rosenblatt machine learning approach to test whether compounds in ChEMBL can be correctly predicted to belong to their appropriate refined families. Validation tests using the refined families gave a significant increase in predictivity compared with the filtered or with the original families. Out of 61,660 queries in our Monte Carlo cross-validation, belonging to 19,639 refined families, 41,300 (66.98%) had the parent family as the top prediction and 53,797 (87.25%) had the parent family in the top four hits. Having thus validated our approach, we used it to identify the protein targets associated with the WADA prohibited classes. For compounds where we do not have experimental data, we use their computed patterns of interaction with protein targets to make predictions of bioactivity. We hope that other groups will test these predictions experimentally in the future.
Highlights
The World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports
In previous computational work, [2,3] we demonstrated that molecules can be classified into performance-enhancing classes using MACCS and CDK cheminformatics descriptors and machine learning methods including Random Forest, k-Nearest Neighbours and Naive Bayes
We propose a novel methodology that can be used to predict unexplored compound to target associations, illustrated using a number of compounds explicitly mentioned in the WADA prohibited list, by taking into account the wealth of information that is found in the ChEMBL database
Summary
The World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. The World Anti-Doping Agency (WADA) defines what chemical compounds and medical procedures are prohibited by publishing the prohibited list, an international standard for identifying substances and methods prohibited in-competition, out-of-competition and in particular sports. We subsequently [4] introduced the UFS-MACCS hybrid descriptor, combining shape and chemistry information, using this to classify a dataset containing 5,245 molecules in ten prohibited classes from the 2005 WADA dataset and 111,231 presumed inactive molecules from the National Cancer Institute (NCI) database. These classification exercises, were based entirely on molecular similarity and included no explicit predictions of interactions of compounds with protein targets
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.