GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining

Yanhuang Jiang,Shuojun Yu,Yanghui Zhang,Yanwei Xi,Shaowei Zhang,Zhuo Song,Peng Lei,Qin Lu,Hua Wang,Chengkun Wu

doi:10.1186/s12920-019-0637-x

Abstract

BackgroundAn important task in the interpretation of sequencing data is to highlight pathogenic genes (or detrimental variants) in the field of Mendelian diseases. It is still challenging despite the recent rapid development of genomics and bioinformatics. A typical interpretation workflow includes annotation, filtration, manual inspection and literature review. Those steps are time-consuming and error-prone in the absence of systematic support. Therefore, we developed GTX.Digest.VCF, an online DNA sequencing interpretation system, which prioritizes genes and variants for novel disease-gene relation discovery and integrates text mining results to provide literature evidence for the discovery. Its phenotype-driven ranking and biological data mining approach significantly speed up the whole interpretation process.ResultsThe GTX.Digest.VCF system is freely available as a web portal at http://vcf.gtxlab.com for academic research. Evaluation on the DDD project dataset demonstrates an accuracy of 77% (235 out of 305 cases) for top-50 genes and an accuracy of 41.6% (127 out of 305 cases) for top-5 genes.ConclusionsGTX.Digest.VCF provides an intelligent web portal for genomics data interpretation via the integration of bioinformatics tools, distributed parallel computing, biomedical text mining. It can facilitate the application of genomic analytics in clinical research and practices.

Highlights

An important task in the interpretation of sequencing data is to highlight pathogenic genes in the field of Mendelian diseases
Performance validation using Developmental Disorders (DDD) project dataset GTX.Digest.variant call format (VCF) is mainly designed for Whole-exome sequencing (WES) interpretation
Hundreds of cases from Deciphering Developmental Disorders (DDD) project and more than 100 cases from different organizations are tested on GTX.Digest.VCF system

Summary

Results

The GTX.Digest.VCF system is freely available as a web portal at http://vcf.gtxlab.com for academic research. Evaluation on the DDD project dataset demonstrates an accuracy of 77% (235 out of 305 cases) for top-50 genes and an accuracy of 41.6% (127 out of 305 cases) for top-5 genes

Conclusions

Background

Results and discussions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Genomics	Publication Date: Dec 1, 2019
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics

Lead the way for us

Similar Papers

Getting started in text mining.
K Bretonnel Cohen ... Lawrence Hunter
PLoS Computational Biology | VOL. 4
K Bretonnel Cohen, et. al.K Bretonnel Cohen ... Lawrence Hunter
01 Jan 2008
PLoS Computational Biology | VOL. 4

Biomedical text mining and its applications in cancer research
Fei Zhu ... Bairong Shen
Journal of Biomedical Informatics | VOL. 46
Fei Zhu, et. al.Fei Zhu ... Bairong Shen
15 Nov 2012
Journal of Biomedical Informatics | VOL. 46

Biomedical Text Mining and Its Applications
Raul Rodriguez-Esteban ... Fran Lewitter
PLoS Computational Biology | VOL. 5
Raul Rodriguez-Esteban, et. al.Raul Rodriguez-Esteban ... Fran Lewitter
24 Dec 2009
PLoS Computational Biology | VOL. 5

Introduction to BLAH5 special issue: recent progress on interoperability of biomedical text mining.
Jin-Dong Kim ... Nigel Collier
Genomics & informatics | VOL. 17
Jin-Dong Kim, et. al.Jin-Dong Kim ... Nigel Collier
27 Jun 2019
Genomics & informatics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics