Abstract

BackgroundIdentifying potential associations between genes and diseases via biomedical experiments must be the time-consuming and expensive research works. The computational technologies based on machine learning models have been widely utilized to explore genetic information related to complex diseases. Importantly, the gene-disease association detection can be defined as the link prediction problem in bipartite network. However, many existing methods do not utilize multiple sources of biological information; Additionally, they do not extract higher-order relationships among genes and diseases.ResultsIn this study, we propose a novel method called Dual Hypergraph Regularized Least Squares (DHRLS) with Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL), in order to detect all potential gene-disease associations. First, we construct multiple kernels based on various biological data sources in gene and disease spaces respectively. After that, we use CAK-MKL to obtain the optimal kernels in the two spaces respectively. To specific, hypergraph can be employed to establish higher-order relationships. Finally, our DHRLS model is solved by the Alternating Least squares algorithm (ALSA), for predicting gene-disease associations.ConclusionComparing with many outstanding prediction tools, DHRLS achieves best performance on gene-disease associations network under two types of cross validation. To verify robustness, our proposed approach has excellent prediction performance on six real-world networks. Our research work can effectively discover potential disease-associated genes and provide guidance for the follow-up verification methods of complex diseases.

Highlights

  • Identifying potential associations between genes and diseases via biomedical experiments must be the time-consuming and expensive research works

  • Comparing Dual Hypergraph Regularized Least Squares (DHRLS) with other state-of-the-art methods on predicting gene-disease associations, including CMF, GRMF and Spa-Laplacian Regularized Least Squares (LapRLS), our model achieves the highest Area under the receiver operating characteristic curve (AUC) and Area Under the Precision-Recall curve (AUPR) in 10-fold cross validation under CV1, but our model achieves lower AUC under CV2 compared with Spa-LapRLS

  • In order to better test the performance of our method, our proposed approach is verified on real gene-disease associations dataset under two types of cross validation

Read more

Summary

Introduction

Identifying potential associations between genes and diseases via biomedical experiments must be the time-consuming and expensive research works. As a machine learning method, the matrix completion methods [11,12,13] can solve the above problem by calculating the similarity information and predicting the association between disease and gene, but the matrix completion method usually takes a long time to converge the local optimal solution. The model is based on the assumption that genes with high similarity are related to similar diseases They are biased by the network topology, and it is necessary to rely on effective similarity information. It is not easy for these methods to integrate related sources of multiple genes and diseases

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call