In order to predict the anti-trypanosome effect of carbazole-derived compounds by quantitative structure-activity relationship, five models were established by the linear method, random forest, radial basis kernel function support vector machine, linear combination mix-kernel function support vector machine, and nonlinear combination mix-kernel function support vector machine (NLMIX-SVM). The heuristic method and optimized CatBoost were used to select two different key descriptor sets for building linear and nonlinear models, respectively. Hyperparameters in all nonlinear models were optimized by comprehensive learning particle swarm optimization with low complexity and fast convergence. Furthermore, the models' robustness and reliability underwent rigorous assessment using fivefold and leave-one-out cross-validation, y-randomization, and statistics including concordance correlation coefficient (CCC), [Formula: see text] , [Formula: see text] , and [Formula: see text] . Among all the models, the NLMIX-SVM model, which was established by support vector regression using a nonlinear combination of radial basis kernel function, sigmoid kernel function, and linear kernel function as a new kernel function, demonstrated excellent learning and generalization abilities as well as robustness: [Formula: see text] = 0.9581, mean square error (MSE) = 0.0199 for the training set and [Formula: see text] = 0.9528, MSE = 0.0174 for the test set. [Formula: see text] , [Formula: see text] , CCC, [Formula: see text] , [Formula: see text], and [Formula: see text] are 0.9539, 0.8908, 0.9752, 0.9529, 0.9528, and 0.9633, respectively. The NLMIX-SVM method proved to be a promising way in quantitative structure-activity relationship research. In addition, molecular docking experiments were conducted to analyze the properties of new derivatives, and a new potential candidate drug molecule was ultimately found. In summary, this study will provide help for the design and screening of novel anti-trypanosome drugs.
Read full abstract