A large-scale benchmark of gene prioritization methods

Dimitri Guala,Erik L L Sonnhammer

doi:10.1038/srep46598

Dimitri Guala, Erik L L Sonnhammer

Open Access

https://doi.org/10.1038/srep46598

Copy DOI

Abstract

In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.

Highlights

This in turn leads to that disease data sets from e.g. OMIM, often used in tool validations and available benchmarks, overestimate the true predictive power of many tools
Candidate genes returned by the gene prioritization tool were labeled as True Positives (TPs), False Positives (FPs), True Negatives (TNs) or False Negatives (FNs) based on their membership in the held out set (Alg. 1)
We have demonstrated the use of the Gene Ontology to construct an unbiased benchmark for network based gene prioritization tools, utilizing FunCoup, one of the most comprehensive functional association networks, as the source of interaction information

Summary

Introduction

This in turn leads to that disease data sets from e.g. OMIM, often used in tool validations and available benchmarks, overestimate the true predictive power of many tools. In this paper we propose the use of Gene Ontology (GO)[8] together with FunCoup[9,10,11], as an objective data source for benchmarking gene prioritization tools. Besides being naturally suited for cross-validation, a model evaluation technique of choice to estimate the generalizability of a tool’s performance, this clustering property improves the robustness of performance measures by decreasing the risk of erroneously assigning a gene product as a False Positive or a True Negative when more knowledge becomes available. In order to provide equal opportunities to the benchmarking tools and to avoid knowledge cross contamination as far as possible, one of the most comprehensive functional association networks, which does not include GO data, FunCoup, was used as the source of interaction data

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Apr 21, 2017
Citations: 42	License type: open-access

R Discovery Prime

R Discovery Prime

A large-scale benchmark of gene prioritization methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

PRIORI-T: A tool for rare disease gene prioritization using MEDLINE.
Aditya Rao ... Vangala G Saipradeep
PLOS ONE | VOL. 15
Aditya Rao, et. al.Aditya Rao ... Vangala G Saipradeep
21 Apr 2020
PLOS ONE | VOL. 15

Text mining in cancer gene and pathway prioritization.
Yuan Luo ... Peter Szolovits
Cancer Informatics | VOL. 13
Yuan Luo, et. al.Yuan Luo ... Peter Szolovits
01 Jan 2014
Cancer Informatics | VOL. 13

Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases
Mengge Zhao ... Kai Wang
NAR Genomics and Bioinformatics | VOL. 2
Mengge Zhao, et. al.Mengge Zhao ... Kai Wang
25 May 2020
NAR Genomics and Bioinformatics | VOL. 2

An unbiased evaluation of gene prioritization tools
Daniela Börnigen ... Bart De Moor
Bioinformatics | VOL. 28
Daniela Börnigen, et. al.Daniela Börnigen ... Bart De Moor
09 Oct 2012
Bioinformatics | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A large-scale benchmark of gene prioritization methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports