Abstract

BackgroundExploring the relationship between disease and gene is of great significance for understanding the pathogenesis of disease and developing corresponding therapeutic measures. The prediction of disease-gene association by computational methods accelerates the process.ResultsMany existing methods cannot fully utilize the multi-dimensional biological entity relationship to predict disease-gene association due to multi-source heterogeneous data. This paper proposes FactorHNE, a factor graph-aggregated heterogeneous network embedding method for disease-gene association prediction, which captures a variety of semantic relationships between the heterogeneous nodes by factorization. It produces different semantic factor graphs and effectively aggregates a variety of semantic relationships, by using end-to-end multi-perspectives loss function to optimize model. Then it produces good nodes embedding to prediction disease-gene association.ConclusionsExperimental verification and analysis show FactorHNE has better performance and scalability than the existing models. It also has good interpretability and can be extended to large-scale biomedical network data analysis.

Highlights

  • Exploring the relationship between disease and gene is of great significance for understanding the pathogenesis of disease and developing corresponding therapeutic measures

  • Baselines To assess the performance of a link prediction model, we adopt the Average precision (AP), Area under the curve (AUC), Precision@K, and Recall@K which commonly used in model evaluation

  • AP represents the area under the P–R curve drawn according to the precision and recall of the model, AUC represents the area under Receiver operating characteristic (ROC) curve of the model, these two indicators are commonly used to evaluate prediction tasks, in addition, the Precision@K and Recall@K denote the precision and recall are producted based on the Kth largest threshold

Read more

Summary

Introduction

Exploring the relationship between disease and gene is of great significance for understanding the pathogenesis of disease and developing corresponding therapeutic measures. The prediction of disease-gene association by computational methods accelerates the process. In the field of biomedical research, the disease-gene association prediction is a fundamental and important problem [1, 2]. With the advancement of machine learning and artificial intelligence research, many machine learning methods have been applied to discover new genetic associations of diseases. There are still many challenges in this research area. The number of gene sets is much larger than that of confirmed disease-related genes. It is difficult to use less data to mine the pattern of disease-gene association. The genetic heterogeneity of diseases makes the pattern diverse, which increases the difficulty of mining too.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call