Biomine: predicting links between biological entities using network models of heterogeneous databases

Lauri Eronen,Hannu Toivonen

doi:10.1186/1471-2105-13-119

Lauri Eronen, Hannu Toivonen

Open Access

https://doi.org/10.1186/1471-2105-13-119

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jun 6, 2012
Citations: 101	License type: CC BY 2.0

Affiliation: BC Platforms (Finland), University of Helsinki

Abstract

BackgroundBiological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases.ResultsBiomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes.ConclusionsThe experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.

Highlights

Biological databases contain large amounts of data concerning the functions and associations of genes and proteins
We describe the Biomine database, and give node proximity measures that can be used for link prediction. (Disease gene prioritization methods are deferred to the Results section, to be presented in the context of that particular application.)
The main goal of the experiments in the first subsection is to demonstrate that the proposed approach of combining data from heterogeneous data sources into a single graph proximity measure is beneficial

Summary

Introduction

Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Biological databases contain a vast amount of readily accessible data concerning the function and relationships of genes and proteins, such as protein interactions, genes’ effects on diseases and functional gene annotations. A practical motivation for our work is prioritization of putative disease genes resulting from genome-wide association studies [1]. Such studies typically produce a large number of genes showing statistical association with the disease in question, of which only a fraction are biologically related to the disease. An important task is to identify the relevant genes from this list of putative disease genes

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Biomine: predicting links between biological entities using network models of heterogeneous databases

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Collective Prediction of Multiple Types of Links in Heterogeneous Information Networks
Bokai Cao ... Xiangnan Kong
-
Bokai Cao, et. al.Bokai Cao ... Xiangnan Kong
01 Dec 2014
01 Dec 2014

Learning Heterogeneous Graph Embedding with Metapath-Based Aggregation for Link Prediction
Chengdong Zhang ... Bin Zhou
Mathematics | VOL. 11
Chengdong Zhang, et. al.Chengdong Zhang ... Bin Zhou
21 Jan 2023
Mathematics | VOL. 11

Supervised temporal link prediction in large-scale real-world networks
Gerrit Jan De Bruin ... H Jaap Van Den Herik
Social Network Analysis and Mining | VOL. 11
Gerrit Jan De Bruin, et. al.Gerrit Jan De Bruin ... H Jaap Van Den Herik
24 Aug 2021
Social Network Analysis and Mining | VOL. 11

Genomic and proteomic data integration for comprehensive biodata search
Arif Canakoglu ... Marco Masseroli
EMBnet.journal | VOL. 18
Arif Canakoglu, et. al.Arif Canakoglu ... Marco Masseroli
09 Nov 2012
EMBnet.journal | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Biomine: predicting links between biological entities using network models of heterogeneous databases

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics