Research on potential therapeutic targets and new mechanisms of action can greatly improve the efficiency of new drug development. Polygenic genetic diseases, such as diabetes, are caused by the interaction of multiple gene loci and environmental factors. In this study, a disease target identification algorithm based on protein recognition is proposed. In this method, the related and unrelated targets are collected from literature databases for treating diabetes. The transcribed proteins corresponding to each target are queried in order to construct a protein dataset. Six protein feature extraction algorithms (AAC, CKSAAGP, DDE, DPC, GAAP, and TPC) are utilized to obtain the feature vectors of each protein, which are merged into the full feature vectors. A novel classifier (forgeNet_GPC) based on forgeNet and Gaussian process classifier (GPC) is proposed to classify the proteins. In forgeNet_GPC, forgeNet is utilized to select the important features, and GPC is utilized to solve the classification problem. The experimental results reveal that forgeNet_GPC performs better than 22 classifiers in terms of ROC-AUC, PR-AUC, MCC, Youden Index, and Kappa.
Read full abstract