One goal of metabolomics is to define and monitor the entire metabolite complement of a cell, while it is still far from reach since systematic and rapid approaches for determining the biotransformations of newly discovered metabolites are lacking. For drug development, such metabolic biotransformation of a new chemical entity (NCE) is of more interest because it may profoundly affect its bioavailability, activity and toxicity profile. The use of in silico methods to predict the site of metabolism (SOM) in phase I cytochromes P450-mediated reactions is usually a starting point of metabolic pathway studies, which may also assist in the process of drug/lead optimization. This article reports the Cytochromes P450 (CYP450)-mediated SOM prediction for the six most important metabolic reactions by incorporating the use of machine learning and semi-empirical quantum chemical calculations. Non-local models were developed on the basis of a large dataset comprising 1858 metabolic reactions extracted from 1034 heterogeneous chemicals. For validation, the overall accuracies of all six reaction types are higher than 0.81, four of which exceed 0.90. In further receiver operating characteristic (ROC) analyses, each of the SOM model gave a significant area under curve (AUC) value over 0.86, indicating a good predicting power. An external test was made on a previously published dataset, of which 80% of the experimentally observed SOMs can be correctly identified by applying the full set of our SOM models. The program package SOME_v1.0 (Site Of Metabolism Estimator) developed based on our models is available at http://www.dddc.ac.cn/adme/myzheng/SOME_1_0.tar.gz.
Read full abstract