Abstract

The identification of Drug-Target Interaction (DTI) is an important process in pharmaceutical scientific research to develop new therapeutic agents for diseases. However, experimental methods involving identification of DTIs are time-consuming, expensive, and challenging. Computational methods that can accurately predict DTI pairs are of great interest because they can significantly reduce time and resources in drug discovery and research. This study presents a machine-learning-based model named, kNN-DTIPred, for DTI prediction that addresses two common problems of datasets: high-dimensionality and class imbalance. First, target protein feature vectors are extracted using Pseudo-Position Specific Scoring Matrix (PsePSSM). Using OpenBabel software, drug compounds are represented using FP2 Molecular Fingerprint. Lasso Dimensionality Reduction is then used to obtain only the most discriminating features while SMOTE is applied for class balancing. Five machine learning models were compared on 4 datasets. The best model was obtained by k-Nearest Neighbors classifier with overall prediction accuracy 98.23%, 94.77%, 95.07%, and 93.09% for enzymes, ion channel, G protein-coupled receptors and nuclear receptor datasets respectively. The area under the curve reached 97.05%, 95.95%, 94.89%, and 94.29%, respectively for the datasets mentioned. Additionally, our results showed that Lasso Dimensionality Reduction and SMOTE have significantly improved the predictive performance. This study has demonstrated that the proposed kNN-DTIPred model is highly accurate and effective in predicting drug-target pairs which can accelerate the DTI identification process by limiting the search space to be investigated in laboratory experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call