Abstract

Knowledge representation learning represents entities and relations of knowledge graph in a continuous low-dimensional semantic space. Recently, various representation learning models have successfully been developed to infer novel relations in general-purpose knowledge bases such as FreeBase and WordNet. However, few studies have used such models for biomedical data for inferring useful relations among biomedical entities such as genes, chemicals, diseases, and symptoms. This study aimed to compare the potential of representation learning models in extracting biomedical relations by using four different types of representation learning models, viz., TransE, PTransE, TransR, and TransH. For training and evaluating the models, we collected and utilized manually curated data from public databases, including relations among chemicals, genes, diseases, and symptoms. Overall, TransE, the most efficient translation-based monolingual knowledge graph embedding model, displayed the best performance with a higher learning speed for large-scale biomedical data. Using TransE, we inferred new relations among chemicals, genes, diseases, and symptoms, and evaluated the reliability of these inferred relations. Furthermore, TransE outperformed an existing statistical method used in the Comparative Toxicogenomics Database for inferring new chemical-disease relations. Together, the present results show that the representation learning model is useful for inferring new biological data from numerous existing biomedical data.

Highlights

  • Multi-relational data contained in common knowledge bases (KBs) are often represented using knowledge graphs [1], where nodes indicate entities and edges represent the relations linking the entities

  • We examined the properties of the biomedical data by comparing them with two other KBs: WordNet and Freebase

  • We compared the performance of the representation learning models (TransE, PTransE, TransR, and TransH) based on the biomedical knowledge base

Read more

Summary

Introduction

Multi-relational data contained in common knowledge bases (KBs) are often represented using knowledge graphs [1], where nodes indicate entities and edges represent the relations linking the entities These entities and relations have been represented as vectors using representation learning models such as TransE [2], PTransE [3], TransR [4], and TransH [5], which are specialized for embedding multi-relational data in a low-dimensional vector space. These representation learning models have been widely used in statistical relational learning and help infer new knowledge in many applications including recommender systems, semantic web, and natural language processing [6].

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call