Recent research has shown that cyclin dependent kinase 7 (CDK7), cyclin dependent kinase 9 (CDK9) and cyclin T1 (CCNT1) can assist transcriptional misregulation in cancer. We combined traditional virtual screening technology with artificial intelligence models to screen multi-target inhibitors from FDA database for the target proteins. R-square (R2) was chosen to evaluate the accuracy of these artificial intelligent models. For CDK7 inhibitors dataset, Adaptive Boosting-Decision Tree Regression (AdaBoost), Support Vector Machine Regression (SVR) and Ridge Regression (Ridge) achieved good results and all nine basic models had R2 more than 0.5; R2 of test set of former three models reached 0.886, 0.860 and 0.815. For CDK9 inhibitors dataset, AdaBoost, Random Forest (RF) and Ridge achieved good results; R2 of test set of these models reached 0.833, 0.788 and 0.759. It seems Adaptive Boosting and Ridge Regression has better generalization ability than other basic models. Adaptive Boosting use plenty of weaker regressors to combine a stronger regressor, which can help it control overfitting. Ridge Regression adds a penalty term on regularization . With the addition of penalty term, the estimation of regression coefficient is no longer unbiased. Therefore, ridge regression is a method to solve the ill-conditioned matrix problem at the cost of abandoning unbiasedness. In order to evaluate the stability of bonds between protein and possible leads, Molecular Dynamics (MD) simulation were performed to verify whether the possible leads were docked well in the protein binding sites. By analyzing the results of virtual screening, artificial intelligent models and MD experiments, we suggest ZINC3830891 (Glutathione) and ZINC19363537 (Tetraethylene pentamine) are the possible multi-target leads inhibitors for TNBC.
Read full abstract