A Scalable and Robust Ensemble Deep Learning Method for Predicting Drug-Target Interactions.
Accurate identification of drug-target interactions (DTIs) is a crucial step in drug discovery. Computational DTI prediction methods can significantly reduce the time and cost associated with drug development. However, effectively integrating multisource features for high-precision DTI prediction remains a challenge. In this study, we propose EDeepDTI, an ensemble deep learning framework designed to increase the accuracy and generalizability of DTI predictions by efficiently integrating multi-view features. EDeepDTI calculates multiple molecular fingerprints to extract rich substructural information from drugs, leverages several advanced pre-trained models to generate drug and protein features enriched with structural and semantic information, and calculates multiple semantic similarity features for drugs and proteins using various similarity measures. During the ensemble learning process, we design a deep learning base learner for each unique pairing of drug and protein features. This ensures that each base learner captures distinct feature interactions, enhancing both independence and diversity within the ensemble. Finally, a greedy strategy is employed to aggregate the predictions from all base learners to improve overall performance. The experimental results demonstrate that EDeepDTI and its variant consistently outperform the baseline methods across multiple datasets and prediction tasks, highlighting the superior performance, robustness, and scalability of EDeepDTI.
- Research Article
75
- 10.2174/0929867327666200907141016
- Sep 7, 2020
- Current Medicinal Chemistry
Drug-target Interactions (DTIs) prediction plays a central role in drug discovery. Computational methods in DTIs prediction have gained more attention because carrying out in vitro and in vivo experiments on a large scale is costly and time-consuming. Machine learning methods, especially deep learning, are widely applied to DTIs prediction. In this study, the main goal is to provide a comprehensive overview of deep learning-based DTIs prediction approaches. Here, we investigate the existing approaches from multiple perspectives. We explore these approaches to find out which deep network architectures are utilized to extract features from drug compound and protein sequences. Also, the advantages and limitations of each architecture are analyzed and compared. Moreover, we explore the process of how to combine descriptors for drug and protein features. Likewise, a list of datasets that are commonly used in DTIs prediction is investigated. Finally, current challenges are discussed and a short future outlook of deep learning in DTI prediction is given.
- Research Article
111
- 10.1186/s12859-017-1460-z
- Jan 17, 2017
- BMC Bioinformatics
BackgroundIn silico drug-target interaction (DTI) prediction plays an integral role in drug repositioning: the discovery of new uses for existing drugs. One popular method of drug repositioning is network-based DTI prediction, which uses complex network theory to predict DTIs from a drug-target network. Currently, most network-based DTI prediction is based on machine learning – methods such as Restricted Boltzmann Machines (RBM) or Support Vector Machines (SVM). These methods require additional information about the characteristics of drugs, targets and DTIs, such as chemical structure, genome sequence, binding types, causes of interactions, etc., and do not perform satisfactorily when such information is unavailable. We propose a new, alternative method for DTI prediction that makes use of only network topology information attempting to solve this problem.ResultsWe compare our method for DTI prediction against the well-known RBM approach. We show that when applied to the MATADOR database, our approach based on node neighborhoods yield higher precision for high-ranking predictions than RBM when no information regarding DTI types is available.ConclusionThis demonstrates that approaches purely based on network topology provide a more suitable approach to DTI prediction in the many real-life situations where little or no prior knowledge is available about the characteristics of drugs, targets, or their interactions.
- Research Article
8
- 10.1016/j.ymeth.2024.01.018
- Feb 14, 2024
- Methods (San Diego, Calif.)
GSL-DTI: Graph structure learning network for Drug-Target interaction prediction
- Research Article
93
- 10.1016/j.ymeth.2017.05.016
- May 24, 2017
- Methods
Drug-target interaction prediction using ensemble learning and dimensionality reduction.
- Research Article
58
- 10.1093/bib/bbac109
- Apr 4, 2022
- Briefings in Bioinformatics
Drug-target interaction (DTI) prediction plays an important role in drug repositioning, drug discovery and drug design. However, due to the large size of the chemical and genomic spaces and the complex interactions between drugs and targets, experimental identification of DTIs is costly and time-consuming. In recent years, the emerging graph neural network (GNN) has been applied to DTI prediction because DTIs can be represented effectively using graphs. However, some of these methods are only based on homogeneous graphs, and some consist of two decoupled steps that cannot be trained jointly. To further explore GNN-based DTI prediction by integrating heterogeneous graph information, this study regards DTI prediction as a link prediction problem and proposes an end-to-end model based on HETerogeneous graph with Attention mechanism (DTI-HETA). In this model, a heterogeneous graph is first constructed based on the drug-drug and target-target similarity matrices and the DTI matrix. Then, the graph convolutional neural network is utilized to obtain the embedded representation of the drugs and targets. To highlight the contribution of different neighborhood nodes to the central node in aggregating the graph convolution information, a graph attention mechanism is introduced into the node embedding process. Afterward, an inner product decoder is applied to predict DTIs. To evaluate the performance of DTI-HETA, experiments are conducted on two datasets. The experimental results show that our model is superior to the state-of-the-art methods. Also, the identification of novel DTIs indicates that DTI-HETA can serve as a powerful tool for integrating heterogeneous graph information to predict DTIs.
- Conference Article
2
- 10.1109/iceca55336.2022.10009182
- Dec 1, 2022
Drug Target Interaction (DTI) prediction is an important factor is drug discovery and repositioning (DDR) since it detects the response of a drug over a target protein. The Coronavirus disease 2019 (COVID-19) disease created groups of deadly pneumonia with clinical appearance mostly similar to SARS-CoV. The precise diagnosis of COVID-19 clinical outcome is more challenging, since the diseases has various forms with varying structures. So predicting the interactions between various drugs with the SARS-CoV target protein is very crucial need in these days, which may leads to discovery of new drugs for the deadly disease. Recently, Deep learning (DL) techniques have been applied by the researches for DTI prediction. Since CNN is one of the major DL models which has the ability to create predictive feature vectors or embeddings, CNN-OSBO encoder-decoder architecture for DTI prediction of Covid-19 targets has been designed Given the input drug and Covid-19 target pair of data, they are fed into the Convolution Neural Networks (CNN) with Opposition based Satin Bowerbird Optimizer (OSBO) encoder modules, separately. Here OSBO is utilized for regulating the hyper parameters (HPs) of CNN layers. Both the encoded data are then embedded to create a binding module. Finally the CNN Decoder module predicts the interaction of drugs over the Covid-19 targets by returning an affinity or interaction score. Experimental results state that DTI prediction using CNN+OSBO achieves better accuracy results when compared with the existing techniques.
- Research Article
- 10.1038/s41467-025-66915-1
- Dec 2, 2025
- Nature communications
Drug Target Interaction (DTI) prediction is vital for drug repurposing. Previous DTI studies on BioSNAP and BindingDB datasets often attribute biased predictions to "drug bias," while our work reveals "target prior bias" as the predominant issue. This bias stems from the "prior tendency," characterized by the imbalanced label distribution of targets in the training data. From causal lens, target "prior tendency" is a confounder, causing models trained with P(Y∣D,T) to learn spurious associations between targets and labels rather than genuine interaction mechanisms. In this study, we introduce alleviating Target Prior Bias in Drug-Target Interaction Prediction (TAPB), a novel debiasing framework that employs amino acid randomization, confounder alignment module (CAM), and interventional training to compute P(Y∣D,do(T)) via backdoor adjustment, thereby addressing this bias. TAPB achieves competitive performance over existing approaches, demonstrating enhanced generalization and providing interpretable insights into DTIs.
- Conference Article
3
- 10.1109/icaaic53929.2022.9793081
- May 9, 2022
Drug discovery is a crucial phase before drug development since it is the most essential and distinctive means of testing all medications prior to their medical usage. Other processes in drug discovery include drug target interaction predictions, drug repurposing or repositioning, and drug design. Prediction of drug-target interactions is crucial in these cases. Proteins, enzymes, ion channels, and other components of the human body that aid in the treatment of disease are called targets. The interaction between protein targets in the human body and chemical compounds in medications is known as drug target interaction. In terms of time and money, drug discovery research laboratory experiments are inefficient. Machine learning-based procedures, on the other hand, improved the drug delivery mechanism. However, machine learning-based procedures improved drug discovery and drug target interaction prediction, which aided in the prediction of novel medications and the identification of new uses for current drugs. Different ensemble learning strategies for prediction are compared in this research. Ensemble learning methods are machine learning-based methods that employ numerous independent similar or different models to generate an output or make predictions. Among the several computational methods for predicting drug target interactions, ensemble learning methods are one of the chemogenomic strategies. When compared to single models, Extra Tree and Random Forest are extremely accurate ensemble learning approaches that also give low bias and low variance.
- Conference Article
1
- 10.1109/bibm.2018.8621514
- Dec 1, 2018
Drug-Target Interaction (DTI) prediction plays an important role in drug discovery and drug repurposing. DTI prediction is usually modeled as a binary classification problem. Unlike previous studies which label unknown DTIs as negative samples, we assume the unknown DTIs are labels that are missing not at random. For example, negative DTI labels are more likely to be missing because biomedical researchers prioritize to study DTIs that are more likely to be positive. We introduce a novel probabilistic model, Factorization with Non-random Missing Labels (FNML), for DTI prediction. FNML models the generative process for the DTI labels (i.e. the labels are positive or negative) and responses (i.e. the labels are observed or missing). In particular, the probability of observing or missing a label is associated with the sign of the label. We also conduct comprehensive experiments to validate the robust performance of the proposed models.
- Research Article
- 10.1021/acs.jcim.5c01250
- Oct 9, 2025
- Journal of chemical information and modeling
Drug-Target Interaction (DTI) prediction is an indispensable process in drug repositioning. Wet-lab experiments for potential DTI identification are reliable but expensive, labor-intensive, and time-consuming. Deep learning demonstrates the superior representation learning capability in the DTI prediction. However, there is still debate about how to accurately learn drug and protein features and further effectively fuse these features. To address the above issues, this work introduces SGcCA, an end-to-end DTI prediction framework by incorporating Spatial and Channel reconstruction Convolution (SCConv), Graph convolutional Network (GCN), and Cross-efficient-additive Attention (CEAA). First, an SCConv module is proposed to encode drug features from their SMILES strings and protein features from their amino acid sequences by reducing spatial and channel redundancies. Next, GCN is employed to encode drug features from their 2D molecular graphs. Subsequently, a CEAA block is devised to fuse the learned drug and protein features. Finally, the fused features are taken as the inputs and all unobserved drug-target pairs are classified through a multilayer perceptron. Using accuracy, F1-score, MCC, AUROC, and AUPRC as evaluation metrics, SGcCA outperformed six popular DTI prediction models (i.e., CPI-GNN, MolTrans, BACPI, CPGL, GIFDTI, and FOTF-CPI) under four different experimental scenarios on four publicly available DTI data sets (Human, C.elegans, BindingDB, and DrugBank), showcasing its better interpretability and generalization ability. Ablation study further underscored the importance of SCConv, CEAA, and GCN. Moreover, visualization of the fused features along with case study and molecular docking outcomes ensured that the predicted DTIs matched closely with the real interactions, further proving the greater performance of SGcCA. As an open-source tool, SGcCA is poised to provide support for drug repositioning. The source codes and data are freely available: https://github.com/plhhnu/SGcCA.
- Conference Article
18
- 10.1109/ictai50040.2020.00060
- Nov 1, 2020
Drug-target interaction (DTI) prediction plays an important role in drug repositioning, drug discovery, and drug design. In recent years, some DTI prediction methods based on machine learning have been proposed. They usually extract features from chemical genomics data. However, these methods are easy to extract redundant information that is not fully related with the prediction task and ignore the latent relationship between drug and target. This paper presents a new DTI prediction model named DTIGCCN. The model uses a spectral-based graph convolutional network (GCN) to extract features from drug and target expression profiles respectively, and a convolutional neural network (CNN) to extract latent associations between drug and target. Finally, the extracted features are concatenated together and fed into an effective classifier for prediction. The advantage of DTIGCCN is that the extracted features are more refined and targeted and the correlation between drug and target is fully applied to the prediction. Experimental results show that our model is superior to the conventional DTI prediction methods based on feature extraction and provides a new idea and method for DTI prediction.
- Research Article
40
- 10.1371/journal.pone.0226484
- Jan 16, 2020
- PLOS ONE
The identification of potential interactions between drugs and target proteins is crucial in pharmaceutical sciences. The experimental validation of interactions in genomic drug discovery is laborious and expensive; hence, there is a need for efficient and accurate in-silico techniques which can predict potential drug-target interactions to narrow down the search space for experimental verification. In this work, we propose a new framework, namely, Multi-Graph Regularized Nuclear Norm Minimization, which predicts the interactions between drugs and target proteins from three inputs: known drug-target interaction network, similarities over drugs and those over targets. The proposed method focuses on finding a low-rank interaction matrix that is structured by the proximities of drugs and targets encoded by graphs. Previous works on Drug Target Interaction (DTI) prediction have shown that incorporating drug and target similarities helps in learning the data manifold better by preserving the local geometries of the original data. But, there is no clear consensus on which kind and what combination of similarities would best assist the prediction task. Hence, we propose to use various multiple drug-drug similarities and target-target similarities as multiple graph Laplacian (over drugs/targets) regularization terms to capture the proximities exhaustively. Extensive cross-validation experiments on four benchmark datasets using standard evaluation metrics (AUPR and AUC) show that the proposed algorithm improves the predictive performance and outperforms recent state-of-the-art computational methods by a large margin. Software is publicly available at https://github.com/aanchalMongia/MGRNNMforDTI.
- Research Article
185
- 10.1007/s10462-022-10283-5
- Nov 2, 2022
- Artificial Intelligence Review
Due to the dominant position of deep learning (mostly deep neural networks) in various artificial intelligence applications, recently, ensemble learning based on deep neural networks (ensemble deep learning) has shown significant performances in improving the generalization of learning system. However, since modern deep neural networks usually have millions to billions of parameters, the time and space overheads for training multiple base deep learners and testing with the ensemble deep learner are far greater than that of traditional ensemble learning. Though several algorithms of fast ensemble deep learning have been proposed to promote the deployment of ensemble deep learning in some applications, further advances still need to be made for many applications in specific fields, where the developing time and computing resources are usually restricted or the data to be processed is of large dimensionality. An urgent problem needs to be solved is how to take the significant advantages of ensemble deep learning while reduce the required expenses so that many more applications in specific fields can benefit from it. For the alleviation of this problem, it is essential to know about how ensemble learning has developed under the era of deep learning. Thus, in this article, we present fundamental discussions focusing on data analyses of published works, methodologies, recent advances and unattainability of traditional ensemble learning and ensemble deep learning. We hope this article will be helpful to realize the intrinsic problems and technical challenges faced by future developments of ensemble learning under the era of deep learning.
- Research Article
8
- 10.1093/bioinformatics/btad774
- Dec 22, 2023
- Bioinformatics
Drug-target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering the ability to use large networks and exploit available datasets and, the generalization to unseen datasets of DTI prediction methods remains unexplored, which could potentially improve the development processes of DTI inferring approaches in terms of accuracy and robustness. In this work, we introduce GeNNius (Graph Embedding Neural Network Interaction Uncovering System), a Graph Neural Network (GNN)-based method that outperforms state-of-the-art models in terms of both accuracy and time efficiency across a variety of datasets. We also demonstrated its prediction power to uncover new interactions by evaluating not previously known DTIs for each dataset. We further assessed the generalization capability of GeNNius by training and testing it on different datasets, showing that this framework can potentially improve the DTI prediction task by training on large datasets and testing on smaller ones. Finally, we investigated qualitatively the embeddings generated by GeNNius, revealing that the GNN encoder maintains biological information after the graph convolutions while diffusing this information through nodes, eventually distinguishing protein families in the node embedding space. GeNNius code is available at https://github.com/ubioinformat/GeNNius.
- Research Article
14
- 10.3390/molecules27092980
- May 6, 2022
- Molecules
Drug-target interaction (DTI) prediction through in vitro methods is expensive and time-consuming. On the other hand, computational methods can save time and money while enhancing drug discovery efficiency. Most of the computational methods frame DTI prediction as a binary classification task. One important challenge is that the number of negative interactions in all DTI-related datasets is far greater than the number of positive interactions, leading to the class imbalance problem. As a result, a classifier is trained biased towards the majority class (negative class), whereas the minority class (interacting pairs) is of interest. This class imbalance problem is not widely taken into account in DTI prediction studies, and the few previous studies considering balancing in DTI do not focus on the imbalance issue itself. Additionally, they do not benefit from deep learning models and experimental validation. In this study, we propose a computational framework along with experimental validations to predict drug-target interaction using an ensemble of deep learning models to address the class imbalance problem in the DTI domain. The objective of this paper is to mitigate the bias in the prediction of DTI by focusing on the impact of balancing and maintaining other involved parameters at a constant value. Our analysis shows that the proposed model outperforms unbalanced models with the same architecture trained on the BindingDB both computationally and experimentally. These findings demonstrate the significance of balancing, which reduces the bias towards the negative class and leads to better performance. It is important to note that leaning on computational results without experimentally validating them and by relying solely on AUROC and AUPRC metrics is not credible, particularly when the testing set remains unbalanced.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.