Cytochrome P450 (P450)-mediated bioactivation, which can lead to the hepatotoxicity through the formation of reactive metabolites (RMs), has been regarded as the major problem of drug failures. Herein, we purposed to establish machine learning models to predict the bioactivation of P450. On the basis of the literature-derived bioactivation dataset, models for Benzene ring, Nitrogen heterocycle and Sulfur heterocycle were developed with machine learning methods, i.e., Random Forest, Random Subspace, SVM and Naïve Bayes. The models were assessed by metrics like "Precision", "Recall", "F-Measure", "AUC" (Area Under the Curve), etc. Random Forest algorithms illustrated the best predictability, with nice AUC values of 0.949, 0.973 and 0.958 for the test sets of Benzene ring, Nitrogen heterocycle and Sulfur heterocycle models, respectively. 2D descriptors like topological indices, 2D autocorrelations and Burden eigenvalues, etc. contributed most to the models. Furthermore, the models were applied to predict the occurrence of bioactivation of an external verification set. Drugs like selpercatinib, glafenine, encorafenib, etc. were predicted to undergo bioactivation into toxic RMs. In vitro, IC50 shift experiment was performed to assess the potential of bioactivation to validate the prediction. Encorafenib and tirbanibulin were observed of bioactivation potential with shifts of 3-6 folds or so. Overall, this study provided a reliable and robust strategy to predict the P450-mediated bioactivation, which will be helpful to the assessment of adverse drug reactions (ADRs) in clinic and the design of new candidates with lower toxicities.
Read full abstract