FKSUDDAPre: A drug-disease association prediction framework based on F-TEST feature selection and AMDKSU resampling with interpretability analysis.
In drug discovery and therapeutic research, the prediction of drug-disease associations (DDAs) holds significant scientific and clinical value. Drug molecules exert their effects by precisely identifying disease-related biological targets, systematically modulating the entire pharmacological process from absorption, distribution, and metabolism to final efficacy. Accurate prediction of drug-disease associations not only facilitates an in-depth understanding of molecular mechanisms of drug action but also provides critical theoretical foundations for drug repositioning and personalized medicine. While traditional prediction methods based on in vitro experiments and clinical statistics yield reliable results, they suffer from inherent drawbacks such as long development cycles, substantial resource consumption, and low throughput. In contrast, emerging machine learning techniques offer a promising solution to these bottlenecks, enabling the intelligent and efficient discovery of potential drug-disease association networks and significantly improving drug development efficiency. However, it is noteworthy that existing machine learning methods still face significant challenges in practical applications: the complexity of feature construction raises the threshold for data processing; data sparsity constrains the depth of information mining; and the pervasive issue of sample imbalance poses a severe challenge to the model's predictive accuracy and generalization performance. In this study, we developed an efficient and accurate framework for drug-disease association prediction named FKSUDDAPre. The model employs a multi-modal feature fusion strategy: on one hand, it leverages an ensemble of Mol2vec and K- BERT to deeply capture the semantic features of drug molecular fingerprints; on the other hand, it integrates Medical Subject Headings (MeSH) with DeepWalk to effectively reduce the dimensionality of disease features while preserving their relational structure. To address the class imbalance problem, FKSUDDAPre designed an optimization algorithm called AMDKSU, which combined clustering with an improved distance metric strategy, significantly enhancing the discriminative power of the sample set. For data processing, F-test was employed for feature importance ranking, effectively reducing data dimensionality and improving model generalization. For the predictive architecture, FKSUDDAPre proposed a novel ensemble framework composed of XGBoost, Decision Tree, Random Forest, and HyperFast. By employing a dynamic weight allocation strategy, this ensemble effectively harnesses the complementary strengths of these models to achieve significantly enhanced predictive performance. Rigorous validation demonstrated the system's outstanding performance across multiple evaluation metrics, with an average AUC of 0.9725, improving the AUC by approximately 3.88% compared to the best-performing baseline model. In the prediction of Alzheimer's disease and Parkinson's disease, 80% and 60% of the top 10 candidate drugs recommended by FKSUDDAPre, respectively, had been confirmed by literature, demonstrating the model's good practical application potential. Furthermore, we conducted a LIME-based feature importance analysis on the model's predictions, visualizing the correlations between features and the target variable to demonstrate the model's interpretability. A cross-platform, user-friendly visualization tool had also been developed using the PyQt5 framework.
- # Drug-disease Associations
- # Molecular Mechanisms Of Drug Action
- # Drug-disease Association Prediction
- # Existing Machine Learning Methods
- # Improving Model Generalization
- # Multiple Evaluation Metrics
- # Model Generalization Performance
- # Prediction Of Alzheimer
- # Model's Predictive Accuracy
- # Model's Predictions
- Research Article
7
- 10.1186/s12859-024-06032-w
- Jan 7, 2025
- BMC Bioinformatics
The process of new drug development is complex, whereas drug-disease association (DDA) prediction aims to identify new therapeutic uses for existing medications. However, existing graph contrastive learning approaches typically rely on single-view contrastive learning, which struggle to fully capture drug-disease relationships. Subsequently, we introduce a novel multi-view contrastive learning framework, named CDPMF-DDA, which enhances the model's ability to capture drug-disease associations by incorporating diverse information representations from different views. First, we decompose the original drug-disease association matrix into drug and disease feature matrices, which are then used to reconstruct the drug-disease association network, as well as the drug-drug and disease-disease similarity networks. This process effectively reduces noise in the data, establishing a reliable foundation for the networks produced. Next, we generate multiple contrastive views from both the original and generated networks. These views effectively capture hidden feature associations, significantly enhancing the model's ability to represent complex relationships. Extensive cross-validation experiments on three standard datasets show that CDPMF-DDA achieves an average AUC of 0.9475 and an AUPR of 0.5009, outperforming existing models. Additionally, case studies on Alzheimer’s disease and epilepsy further validate the model’s effectiveness, demonstrating its high accuracy and robustness in drug-disease association prediction. Based on a multi-view contrastive learning framework, CDPMF-DDA is capable of integrating multi-source information and effectively capturing complex drug-disease associations, making it a powerful tool for drug repositioning and the discovery of new therapeutic strategies.
- Research Article
14
- 10.1016/j.artmed.2024.102805
- Feb 17, 2024
- Artificial Intelligence In Medicine
GCNGAT: Drug–disease association prediction based on graph convolution neural network and graph attention network
- Research Article
322
- 10.1093/bib/bbaa243
- Oct 20, 2020
- Briefings in Bioinformatics
Determining drug-disease associations is an integral part in the process of drug development. However, the identification of drug-disease associations through wet experiments is costly and inefficient. Hence, the development of efficient and high-accuracy computational methods for predicting drug-disease associations is of great significance. In this paper, we propose a novel computational method named as layer attention graph convolutional network (LAGCN) for the drug-disease association prediction. Specifically, LAGCN first integrates the known drug-disease associations, drug-drug similarities and disease-disease similarities into a heterogeneous network, and applies the graph convolution operation to the network to learn the embeddings of drugs and diseases. Second, LAGCN combines the embeddings from multiple graph convolution layers using an attention mechanism. Third, the unobserved drug-disease associations are scored based on the integrated embeddings. Evaluated by 5-fold cross-validations, LAGCN achieves an area under the precision-recall curve of 0.3168 and an area under the receiver-operating characteristic curve of 0.8750, which are better than the results of existing state-of-the-art prediction methods and baseline methods. The case study shows that LAGCN can discover novel associations that are not curated in our dataset. LAGCN is a useful tool for predicting drug-disease associations. This study reveals that embeddings from different convolution layers can reflect the proximities of different orders, and combining the embeddings by the attention mechanism can improve the prediction performances.
- Conference Article
8
- 10.1109/bibm.2018.8621191
- Dec 1, 2018
Predicting drug-disease associations using computational methods benefits drug repositioning. Drug-disease associations are events that drugs exert effects on diseases, there are different effects about drug-disease associations. For example, drug-disease associations are annotated as therapeutic or marker/mechanism (non-therapeutic) in Comparative Toxicogenomics database (CTD). However, existing association prediction methods ignore effects that drugs exert on diseases. In this paper, we propose a signed network-based nonnegative matrix factorization method (SNNMF) to predict drug-disease associations and their effects. First, drug-disease associations are represented as a signed bipartite network with two types of links for therapeutic effects and non-therapeutic effects. After decomposing the network into two subnetworks, SNNMF aims to approximate the association matrix of each subnetwork by two nonnegative matrices, which are low-dimensional latent representations for drugs and diseases respectively, and diseases in two subnetworks share the same latent representations. In the computational experiments, SNNMF performs well in predicting effects of drug-disease associations. Moreover, SNNMF accurately predicts drug-disease associations and outperforms existing association prediction methods. Case studies show that SNNMF helps to find out novel drug-disease associations that are not included in CTD, and simultaneously predicts their therapeutic effects.
- Research Article
24
- 10.3389/fbioe.2020.00218
- Apr 9, 2020
- Frontiers in Bioengineering and Biotechnology
Identifying drug-disease associations is integral to drug development. Computationally prioritizing candidate drug-disease associations has attracted growing attention due to its contribution to reducing the cost of laboratory screening. Drug-disease associations involve different association types, such as drug indications and drug side effects. However, the existing models for predicting drug-disease associations merely concentrate on independent tasks: recommending novel indications to benefit drug repositioning, predicting potential side effects to prevent drug-induced risk, or only determining the existence of drug-disease association. They ignore crucial prior knowledge of the correlations between different association types. Since the Comparative Toxicogenomics Database (CTD) annotates the drug-disease associations as therapeutic or marker/mechanism, we consider predicting the two types of association. To this end, we propose a collective matrix factorization-based multi-task learning method (CMFMTL) in this paper. CMFMTL handles the problem as multi-task learning where each task is to predict one type of association, and two tasks complement and improve each other by capturing the relatedness between them. First, drug-disease associations are represented as a bipartite network with two types of links representing therapeutic effects and non-therapeutic effects. Then, CMFMTL, respectively, approximates the association matrix regarding each link type by matrix tri-factorization, and shares the low-dimensional latent representations for drugs and diseases in the two related tasks for the goal of collective learning. Finally, CMFMTL puts the two tasks into a unified framework and an efficient algorithm is developed to solve our proposed optimization problem. In the computational experiments, CMFMTL outperforms several state-of-the-art methods both in the two tasks. Moreover, case studies show that CMFMTL helps to find out novel drug-disease associations that are not included in CTD, and simultaneously predicts their association types.
- Research Article
38
- 10.1016/j.asoc.2021.107811
- Aug 13, 2021
- Applied Soft Computing
Drug–disease associations prediction via Multiple Kernel-based Dual Graph Regularized Least Squares
- Research Article
4
- 10.3934/mbe.2021367
- Jan 1, 2021
- Mathematical Biosciences and Engineering
The development of new drugs is a time-consuming and labor-intensive process. Therefore, researchers use computational methods to explore other therapeutic effects of existing drugs, and drug-disease association prediction is an important branch of it. The existing drug-disease association prediction method ignored the prior knowledge contained in the drug-disease association data, which provided a strong basis for the research. Moreover, the previous methods only paid attention to the high-level features in the network when extracting features, and directly fused or connected them in series, resulting in the loss of information. Therefore, we propose a novel deep learning model for drug-disease association prediction, called DCNN. The model introduces the Gaussian interaction profile kernel similarity for drugs and diseases, and combines them with the structural similarity of drugs and the semantic similarity of diseases to construct the feature space jointly. Then dense convolutional neural network (DenseCNN) is used to capture the feature information of drugs and diseases, and introduces a convolutional block attention module (CBAM) to weight features from the channel and space levels to achieve adaptive optimization of features. The ten-fold cross-validation results of the model DCNN and the experimental results of the case study show that it is superior to the existing drug-disease association predictors and effectively predicts the drug-disease associations.
- Research Article
1
- 10.1109/jbhi.2025.3542784
- Nov 1, 2025
- IEEE journal of biomedical and health informatics
The research on identifying drug-disease associations (DDAs) is widely used in scenarios such as drug development, clinical decision-making, and drug repurposing, holding significant biological and medical significance. Existing methods for drug-disease association prediction have achieved decent performance, they primarily rely on simplistic drug-disease association graphs or similarity graphs. These methods often struggle to capture the high-order correlations of complex multimodal data, limiting their ability to handle the complexity of data associations effectively. In addition, real drug-disease associations are highly sparse, posing a significant challenge to prediction accuracy. To tackle these issues, we propose a general hypergraph neural network framework for drug-disease association prediction based on hierarchical contrastive learning and cross-attention learning. It leverages hypergraph neural networks to learn representations of drugs and diseases carrying high-order correlations and strengthens representation quality using interactive attention learning and hierarchical contrastive learning. Meanwhile, the $\lambda$-weighted loss function is utilized to adapt to the high sparsity property of real drug-disease associations during model training and improve prediction performance. Extensive experiments demonstrate that DD-HGNN$^+$ surpasses other state-of-the-art methods in predicting drug-disease associations and further validation through case studies on Leukemia and Colorectal Neoplasms underscores its reliability.
- Research Article
4
- 10.1109/jbhi.2023.3300717
- Oct 1, 2023
- IEEE Journal of Biomedical and Health Informatics
Predicting drug-disease associations (DDAs) through computational methods has become a prevalent trend in drug development because of their high efficiency and low cost. Existing methods usually focus on constructing heterogeneous networks by collecting multiple data resources to improve prediction ability. However, potential association possibilities of numerous unconfirmed drug-related or disease-related pairs are not sufficiently considered. In this article, we propose a novel computational model to predict new DDAs. First, a heterogeneous network is constructed, including four types of nodes (drugs, targets, cell lines, diseases) and three types of edges (associations, association scores, similarities). Second, an updating and merging-based similarity network fusion method, termed UM-SF, is presented to fuse various similarity networks with diverse weights. Finally, an intermediate layer-mediated multi-view feature projection representation method, termed IM-FP, is proposed to calculate the predicted DDA scores. This method uses multiple association scores to construct multi-view drug features, then projects them into disease space through the intermediate layer, where an intermediate layer similarity constraint is designed to learn the projection matrices. Results of comparative experiments reveal the effectiveness of our innovations. Comparisons with other state-of-the-art models by the 10-fold cross-validation experiment indicate our model's advantage on AUROC and AUPR metrics. Moreover, our proposed model successfully predicted 107 novel high-ranked DDAs.
- Research Article
2
- 10.1089/cmb.2023.0135
- Sep 1, 2023
- Journal of Computational Biology
In the field of drug development and repositioning, the prediction of drug-disease associations is a critical task. A recently proposed method for predicting drug-disease associations based on graph convolution relies heavily on the features of adjacent nodes within the homogeneous network for characterizing information. However, this method lacks node attribute information from heterogeneous networks, which could hardly provide valuable insights for predicting drug-disease associations. In this study, a novel drug-disease association prediction model called DAHNGC is proposed, which is based on a graph convolutional neural network. This model includes two feature extraction methods that are specifically designed to extract the attribute characteristics of drugs and diseases from both homogeneous and heterogeneous networks. First, the DropEdge technique is added to the graph convolutional neural network to alleviate the oversmoothing problem and obtain the characteristics of the same nodes of drugs or diseases in the homogeneous network. Then, an automatic feature extraction method in the heterogeneous network is designed to obtain the features of drugs or diseases at different nodes. Finally, the obtained features are put into the fully connected network for nonlinear transformation, and the potential drug-disease pairs are obtained by bilinear decoding. Experimental results demonstrate that the DAHNGC model exhibits good predictive performance for drug-disease associations.
- Research Article
35
- 10.1093/bioinformatics/btad357
- Jun 1, 2023
- Bioinformatics
An imperative step in drug discovery is the prediction of drug-disease associations (DDAs), which tries to uncover potential therapeutic possibilities for already validated drugs. It is costly and time-consuming to predict DDAs using wet experiments. Graph Neural Networks as an emerging technique have shown superior capacity of dealing with DDA prediction. However, existing Graph Neural Networks-based DDA prediction methods suffer from sparse supervised signals. As graph contrastive learning has shined in mitigating sparse supervised signals, we seek to leverage graph contrastive learning to enhance the prediction of DDAs. Unfortunately, most conventional graph contrastive learning-based models corrupt the raw data graph to augment data, which are unsuitable for DDA prediction. Meanwhile, these methods could not model the interactions between nodes effectively, thereby reducing the accuracy of association predictions. A model is proposed to tap potential drug candidates for diseases, which is called Similarity Measures-based Graph Co-contrastive Learning (SMGCL). For learning embeddings from complicated network topologies, SMGCL includes three essential processes: (i) constructs three views based on similarities between drugs and diseases and DDA information; (ii) two graph encoders are performed over the three views, so as to model both local and global topologies simultaneously; and (iii) a graph co-contrastive learning method is introduced, which co-trains the representations of nodes to maximize the agreement between them, thus generating high-quality prediction results. Contrastive learning serves as an auxiliary task for improving DDA predictions. Evaluated by cross-validations, SMGCL achieves pleasing comprehensive performances. Further proof of the SMGCL's practicality is provided by case study of Alzheimer's disease. https://github.com/Jcmorz/SMGCL.
- Supplementary Content
18
- 10.3390/biom12101497
- Oct 17, 2022
- Biomolecules
Drug repositioning, which involves the identification of new therapeutic indications for approved drugs, considerably reduces the time and cost of developing new drugs. Recent computational drug repositioning methods use heterogeneous networks to identify drug–disease associations. This review reveals existing network-based approaches for predicting drug–disease associations in three major categories: graph mining, matrix factorization or completion, and deep learning. We selected eleven methods from the three categories to compare their predictive performances. The experiment was conducted using two uniform datasets on the drug and disease sides, separately. We constructed heterogeneous networks using drug–drug similarities based on chemical structures and ATC codes, ontology-based disease–disease similarities, and drug–disease associations. An improved evaluation metric was used to reflect data imbalance as positive associations are typically sparse. The prediction results demonstrated that methods in the graph mining and matrix factorization or completion categories performed well in the overall assessment. Furthermore, prediction on the drug side had higher accuracy than on the disease side. Selecting and integrating informative drug features in drug–drug similarity measurement are crucial for improving disease-side prediction.
- Research Article
8
- 10.3390/ijms20174102
- Aug 22, 2019
- International Journal of Molecular Sciences
Identifying new indications for existing drugs may reduce costs and expedites drug development. Drug-related disease predictions typically combined heterogeneous drug-related and disease-related data to derive the associations between drugs and diseases, while recently developed approaches integrate multiple kinds of drug features, but fail to take the diversity implied by these features into account. We developed a method based on non-negative matrix factorization, DivePred, for predicting potential drug–disease associations. DivePred integrated disease similarity, drug–disease associations, and various drug features derived from drug chemical substructures, drug target protein domains, drug target annotations, and drug-related diseases. Diverse drug features reflect the characteristics of drugs from different perspectives, and utilizing the diversity of multiple kinds of features is critical for association prediction. The various drug features had higher dimensions and sparse characteristics, whereas DivePred projected high-dimensional drug features into the low-dimensional feature space to generate dense feature representations of drugs. Furthermore, DivePred’s optimization term enhanced diversity and reduced redundancy of multiple kinds of drug features. The neighbor information was exploited to infer the likelihood of drug–disease associations. Experiments indicated that DivePred was superior to several state-of-the-art methods for prediction drug-disease association. During the validation process, DivePred identified more drug-disease associations in the top part of prediction result than other methods, benefitting further biological validation. Case studies of acetaminophen, ciprofloxacin, doxorubicin, hydrocortisone, and ampicillin demonstrated that DivePred has the ability to discover potential candidate disease indications for drugs.
- Research Article
2
- 10.1093/bib/bbac123
- Apr 8, 2022
- Briefings in bioinformatics
Identifying new uses of approved drugs is an effective way to reduce the time and cost of drug development. Recent computational approaches for predicting drug-disease associations have integrated multi-sourced data on drugs and diseases. However, neighboring topologies of various scales in multiple heterogeneous drug-disease networks have yet to be exploited and fully integrated. We propose a novel method for drug-disease association prediction, called MGPred, used to encode and learn multi-scale neighboring topologies of drug and disease nodes and pairwise attributes from heterogeneous networks. First, we constructed three heterogeneous networks based on multiple kinds of drug similarities. Each network comprises drug and disease nodes and edges created based on node-wise similarities and associations that reflect specific topological structures. We also propose an embedding mechanism to formulate topologies that cover different ranges of neighbors. To encode the embeddings and derive multi-scale neighboring topology representations of drug and disease nodes, we propose a module based on graph convolutional autoencoders with shared parameters for each heterogeneous network. We also propose scale-level attention to obtain an adaptive fusion of informative topological representations at different scales. Finally, a learning module based on a convolutional neural network with various receptive fields is proposed to learn multi-view attribute representations of a pair of drug and disease nodes. Comprehensive experiment results demonstrate that MGPred outperforms other state-of-the-art methods in comparison to drug-related disease prediction, and the recall rates for the top-ranked candidates and case studies on five drugs further demonstrate the ability of MGPred to retrieve potential drug-disease associations.
- Research Article
11
- 10.1108/dta-01-2019-0004
- Jun 7, 2019
- Data Technologies and Applications
PurposeThe traditional drug development process is costly, time consuming and risky. Using computational methods to discover drug repositioning opportunities is a promising and efficient strategy in the era of big data. The explosive growth of large-scale genomic, phenotypic data and all kinds of “omics” data brings opportunities for developing new computational drug repositioning methods based on big data. The paper aims to discuss this issue.Design/methodology/approachHere, a new computational strategy is proposed for inferring drug–disease associations from rich biomedical resources toward drug repositioning. First, the network embedding (NE) algorithm is adopted to learn the latent feature representation of drugs from multiple biomedical resources. Furthermore, on the basis of the latent vectors of drugs from the NE module, a binary support vector machine classifier is trained to divide unknown drug–disease pairs into positive and negative instances. Finally, this model is validated on a well-established drug–disease association data set with tenfold cross-validation.FindingsThis model obtains the performance of an area under the receiver operating characteristic curve of 90.3 percent, which is comparable to those of similar systems. The authors also analyze the performance of the model and validate its effect on predicting the new indications of old drugs.Originality/valueThis study shows that the authors’ method is predictive, identifying novel drug–disease interactions for drug discovery. The new feature learning methods also positively contribute to the heterogeneous data integration.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.