Abstract

Understanding the role of genes in human disease is of high importance. However, identifying genes associated with human diseases requires laborious experiments that involve considerable effort and time. Therefore, a computational approach to predict candidate genes related to complex diseases including cancer has been extensively studied. In this study, we propose a convolutional neural network-based knowledge graph-embedding model (KGED), which is based on a biological knowledge graph with entity descriptions to infer relationships between biological entities. As an application demonstration, we generated gene-interaction networks for each cancer type using gene-gene relationships inferred by KGED. We then analyzed the constructed gene networks using network centrality measures, including betweenness, closeness, degree, and eigenvector centrality metrics, to rank the central genes of the network and identify highly correlated cancer genes. Furthermore, we evaluated our proposed approach for prostate, breast, and lung cancers by comparing the performance with that of existing approaches. The KGED model showed improved performance in predicting cancer-related genes using the inferred gene-gene interactions. Thus, we conclude that gene-gene interactions inferred by KGED can be helpful for future research, such as that aimed at future research on pathogenic mechanisms of human diseases, and contribute to the field of disease treatment discovery.

Highlights

  • We compared the performance of knowledge graph-embedding model (KGED) with that of TransE and ConvKB

  • To identify the Convolutional neural network-based model by embedding a biological knowledge graph with entity descriptions differences between the representations of entities trained by TransE and those trained by ConvKB, we calculated cosine distances between embedding vectors for each head and tail entity in each training triple, which were trained by TransE and ConvKB, depending on the window of n rows of embedding

  • The head and tail entities in the negative triple were semantically unrelated, the ratio of the number of pairs of local embeddings in the cosine distance range between 0 and 0.1 was significantly higher when we used ConvKB than when we used TransE. According to this experimental result, we assumed that this phenomenon may have improved the performance of ConvKB and KGED compared with TransE

Read more

Summary

Introduction

Lung cancer is the most commonly diagnosed cancer (11.6% of total cases) and is the leading cause of cancer death (18.4% of total cancer deaths). This is closely followed by female breast cancer (11.6%) and prostate cancer (7.1%) [1]. Cancer is a genetic disease, and cancer-related genes are mutated and dysregulated, leading to tumor formation and cancer [2]. As genes function together in signaling and regulatory pathways, somatic mutations

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call