Abstract Background The decryption of massive variants of uncertain significance (VUS) discovered by sequencing poses the main challenge in the post-sequencing era. VUS also resulted in a particular problem in the field of pharmacogenomics, which can be epitomized by drug-metabolizing enzymes, such as CYP2C9. Therefore, an accurate model that incorporates the inner connection between massive protein variants and their function alterations is crucial for clinical annotation of human variome and the virtual evolution of enzyme engineering for biotherapeutics. Methods We developed a model named variant effect recognition network for CYP2C9 (vERnet-P), which learned features from AlphaFold2-predicted protein structures and predicted the enzyme activity of missense single-nucleotide variants in CYP2C9. The two crucial strategies of vERnet-P are the construction of amino acid interaction networks and the application of deep learning. Therefore, vERnet-P leveraged highly accurate protein structures predicted by AlphaFold2, combined with novel techniques for enriching and capturing useful information. Based on the accurate and preemptive prediction of CYP2C9 variants, we performed the saturation mutation prediction on several sites of CYP2C9 to identify new variants with the expected function. Furthermore, the drug metabolic activities of these newly identified variants were investigated with two probe drugs by in vitro metabolic activity assessments. Results An accuracy of up to 93.5% for the classification of activity levels was yielded in the testing dataset. In addition, the tasks for recognizing the high-activity and low-activity variants both achieved 93.5% accuracies, demonstrating that vERnet-P truly learned the features related to enzyme activity, rather than the selection bias to one kind of samples. Strikingly, the AUC values of the ROC and the PR reached 0.971 and 0.966 respectively in the testing cohort for vERnet-P. vERnet-P achieved the strongest agreement with massively parallel assessments relative to the other 8 computational variant effect predictors in the task of predicting CYP2C9 variant enzyme activity. We explored 6 mutant sites in CYP2C9 by using vERnet-P to perform saturation mutation prediction and found 12 novel variants with a high possibility of activity changing. We identified 6 variants with most likely increased activity by using the prediction of wild-type as a cutoff, consequently, 6 variants with most likely decreased activity were selected. According to these 12 discovered variants, 10 were strongly confirmed by the in vitro metabolic activity assays of both probe drugs. Conclusions vERnet-P achieved state-of-the-art performance in predicting the CYP2C9 variant enzyme activity. The preemptive prediction for CYP2C9 variants can efficiently provide the clinical drug dosing guideline. Furthermore, by saturating mutation prediction on several sites, our results indicated the evolutionary direction of CYP2C9 enzyme activity and discovered some brand-new CYP2C9 variants with ultra-strong functional alterations, which have not previously been reported. The results of in vitro metabolic activity assessment were highly consistent with our in silico prediction, indicating the potential of AI models for inferring evolutionary direction and designing novel variants with the expected function.
Read full abstract