Evolving knowledge graph similarity for supervised learning in complex biomedical domains

Rita T Sousa,Catia Pesquita,Sara Silva

doi:10.1186/s12859-019-3296-1

Abstract

BackgroundIn recent years, biomedical ontologies have become important for describing existing biological knowledge in the form of knowledge graphs. Data mining approaches that work with knowledge graphs have been proposed, but they are based on vector representations that do not capture the full underlying semantics. An alternative is to use machine learning approaches that explore semantic similarity. However, since ontologies can model multiple perspectives, semantic similarity computations for a given learning task need to be fine-tuned to account for this. Obtaining the best combination of semantic similarity aspects for each learning task is not trivial and typically depends on expert knowledge.ResultsWe have developed a novel approach, evoKGsim, that applies Genetic Programming over a set of semantic similarity features, each based on a semantic aspect of the data, to obtain the best combination for a given supervised learning task. The approach was evaluated on several benchmark datasets for protein-protein interaction prediction using the Gene Ontology as the knowledge graph to support semantic similarity, and it outperformed competing strategies, including manually selected combinations of semantic aspects emulating expert knowledge. evoKGsim was also able to learn species-agnostic models with different combinations of species for training and testing, effectively addressing the limitations of predicting protein-protein interactions for species with fewer known interactions.ConclusionsevoKGsim can overcome one of the limitations in knowledge graph-based semantic similarity applications: the need to expertly select which aspects should be taken into account for a given application. Applying this methodology to protein-protein interaction prediction proved successful, paving the way to broader applications.

Highlights

In recent years, biomedical ontologies have become important for describing existing biological knowledge in the form of knowledge graphs
A key aspect of our evaluation approach is to compare evoKGsim, that is able to evolve a combination of semantic aspects, to static combinations established a priori
Knowledge-graph based semantic similarity measures have several very important biomedical applications, ranging from the prediction of protein-protein interactions, of gene product function or even of genes associated with diseases

Summary

Introduction

Biomedical ontologies have become important for describing existing biological knowledge in the form of knowledge graphs. Obtaining the best combination of semantic similarity aspects for each learning task is not trivial and typically depends on expert knowledge. Knowledge discovery in complex domains can be a challenge for data mining methods, which are typically limited to agnostic views of the data, without being able to gain access to its context and meaning. Semantic representations of data entities based on KGs that can be explored by data mining approaches provide a unique opportunity to enhance knowledge discovery processes. Some approaches combining methods from data mining and knowledge discovery with KGs have been proposed [6]. One of the biggest challenges faced by these approaches is how to transform data coming from KGs into a suitable representation that can be processed by data mining algorithms. Most of the existing approaches build a propositional feature vector representation of the data (i.e., each instance is represented as a vector of features), which allows the subsequent application of most existent data mining algorithms

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Jan 3, 2020
Citations: 27	License type: open-access

R Discovery Prime

R Discovery Prime

Evolving knowledge graph similarity for supervised learning in complex biomedical domains

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Evolving Meaning for Supervised Learning in Complex Biomedical Domains Using Knowledge Graphs
Rita T Sousa
-
Rita T SousaRita T Sousa
01 Jan 2020
01 Jan 2020

GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings
Xiaoshi Zhong ... Rama Kaalia
BMC Genomics | VOL. 20
Xiaoshi Zhong, et. al.Xiaoshi Zhong ... Rama Kaalia
01 Dec 2019
BMC Genomics | VOL. 20

Automatically Selecting Complementary Vector Representations for Semantic Textual Similarity
Julien Hay ... Philippe Muller
-
Julien Hay, et. al.Julien Hay ... Philippe Muller
01 Jan 2019
01 Jan 2019

Predicting Missing and Spurious Protein-Protein Interactions Using Graph Embeddings on GO Annotation Graph
Xiaoshi Zhong ... Jagath C Rajapakse
-
Xiaoshi Zhong, et. al.Xiaoshi Zhong ... Jagath C Rajapakse
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evolving knowledge graph similarity for supervised learning in complex biomedical domains

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics