Identifying potential drug target proteins is a crucial step in the process of drug discovery and plays a key role in the study of the molecular mechanisms of disease. Based on the fact that the majority of proteins exert their functions through interacting with each other, we propose a method to recognize target proteins by using the human protein–protein interaction network and graph theory. In the network, vertexes and edges are weighted by using the confidence scores of interactions and descriptors of protein primary structure, respectively. The novel network topological features are defined and employed to characterize protein using existing databases. A widely used minimum redundancy maximum relevance and random forests algorithm are utilized to select the optimal feature subset and construct model for the identification of potential drug target proteins at the proteome scale. The accuracies of training set and test set are 89.55% and 85.23%. Using the constructed model, 2127 potential drug target proteins have been recognized and 156 drug target proteins have been validated in the database of drug target. In addition, some new drug target proteins can be considered as targets for treating diseases of mucopolysaccharidosis, non-arteritic anterior ischemic optic neuropathy, Bernard–Soulier syndrome and pseudo-von Willebrand, etc. It is anticipated that the proposed method may became a powerful high-throughput virtual screening tool of drug target.
Read full abstract