Using predicate and provenance information from a knowledge graph for drug efficacy screening

Wytze J Vlietstra,Rein Vos,Erik M Van Mulligen,Jan A Kors,Anneke M Sijbers

doi:10.1186/s13326-018-0189-6

Abstract

BackgroundBiomedical knowledge graphs have become important tools to computationally analyse the comprehensive body of biomedical knowledge. They represent knowledge as subject-predicate-object triples, in which the predicate indicates the relationship between subject and object. A triple can also contain provenance information, which consists of references to the sources of the triple (e.g. scientific publications or database entries). Knowledge graphs have been used to classify drug-disease pairs for drug efficacy screening, but existing computational methods have often ignored predicate and provenance information. Using this information, we aimed to develop a supervised machine learning classifier and determine the added value of predicate and provenance information for drug efficacy screening. To ensure the biological plausibility of our method we performed our research on the protein level, where drugs are represented by their drug target proteins, and diseases by their disease proteins.ResultsUsing random forests with repeated 10-fold cross-validation, our method achieved an area under the ROC curve (AUC) of 78.1% and 74.3% for two reference sets. We benchmarked against a state-of-the-art knowledge-graph technique that does not use predicate and provenance information, obtaining AUCs of 65.6% and 64.6%, respectively. Classifiers that only used predicate information performed superior to classifiers that only used provenance information, but using both performed best.ConclusionWe conclude that both predicate and provenance information provide added value for drug efficacy screening.

Highlights

Biomedical knowledge graphs have become important tools to computationally analyse the comprehensive body of biomedical knowledge
Drug targets and disease proteins were connected by 267,032 direct paths, and almost 50 million indirect paths
The triples were taken from 25 different knowledge sources [see Additional file 1: Table S1], and contained 45 different predicate types [Additional file 1: Table S2]

Summary

Introduction

Biomedical knowledge graphs have become important tools to computationally analyse the comprehensive body of biomedical knowledge They represent knowledge as subject-predicate-object triples, in which the predicate indicates the relationship between subject and object. Knowledge graphs have been used to classify drug-disease pairs for drug efficacy screening, but existing computational methods have often ignored predicate and provenance information. Knowledge graphs describe biomedical entities, such as diseases, proteins, or drugs, and their relationships [1] They represent knowledge by subject-predicate-object triples, in which the predicate indicates the relationship between an entity pair (subject and object) [2]. Knowledge contained in a variety of Knowledge graphs have been applied to multiple problems in biomedical research, such as the extraction of disease biomarkers [5], identification of disease mechanisms [6], and numerous pharmacological use cases in the Open PHACTS project [7]. One of the most important use cases in computational pharmacology is the prediction of the health benefits of a drug over a placebo, i.e. its efficacy [8]

Objectives

Methods

Results

Discussion

Conclusion