Identification of drug–target interactions (DTIs) is critical for discovering potential target protein candidates for new drugs. However, traditional experimental methods have limitations in discovering DTIs. They are time-consuming, tedious, and expensive, and often suffer from high false-positive rates and false-negative rates. Therefore, using computational methods to predict DTIs has received extensive attention from many researchers in recent years. To address this issue, in this paper, an effective prediction model is presented which is based on the information of drug molecular structure data and protein sequence data. It performs prediction with the following procedures. First, we transform the sequences of each target into a position-specific scoring matrix (PSSM), such that the features can retain biological evolutionary information. We then use a feature vector of molecular substructure fingerprints to describe the chemical structure information of the drug compounds. Second, the Legendre moments algorithm is used to extract new features from the PSSM. Finally, a classification algorithm called rotation forest is used to perform prediction, we tested its prediction performance on four golden standard data sets: enzymes, G-protein-coupled receptors, ion channels, and nuclear receptors. As a result, the proposed method achieves average accuracies of 0.9026, 0.8260, 0.8703, and 0.7444 on these four data sets using five-fold cross-validation. We also compare the proposed method with the support vector machine and other existing approaches. The proposed model is proved to be superior to comparative methods, showing that it is feasible, effective, and robust for predicting potential DTI.
Read full abstract