Abstract
DNA-binding proteins are an essential part of the DNA. It also an integral component during life processes of various organisms, for instance, DNA recombination, replication, and so on. Recognition of such proteins helps medical researchers pinpoint the cause of disease. Traditional techniques of identifying DNA-binding proteins are expensive and time-consuming. Machine learning methods can identify these proteins quickly and efficiently. However, the accuracies of the existing related methods were not high enough. In this paper, we propose a framework to identify DNA-binding proteins. The proposed framework first uses PseKNC (ps), MomoKGap (mo), and MomoDiKGap (md) methods to combine three algorithms to extract features. Further, we apply Adaboost weight ranking to select optimal feature subsets from the above three types of features. Based on the selected features, three algorithms (k-nearest neighbor (knn), Support Vector Machine (SVM), and Random Forest (RF)) are applied to classify it. Finally, three predictors for identifying DNA-binding proteins are established, including [Formula: see text], [Formula: see text], [Formula: see text]. We utilize benchmark and independent datasets to train and evaluate the proposed framework. Three tests are performed, including Jackknife test, 10-fold cross-validation and independent test. Among them, the accuracy of ps+md is the highest. We named the model with the best result as psmdDBPs and applied it to identify DNA-binding proteins.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have