Variant-Kudu: An Efficient Tool kit Leveraging Distributed Bitmap Index for Analysis of Massive Genetic Variation Datasets.

Jianye Fan,Shoubin Dong,Bo Wang

doi:10.1089/cmb.2019.0344

Abstract

The storage and analysis of massive genetic variation datasets in variant call format (VCF) become a great challenge with the rapid growth of genetic variation data in recent years. Traditional single process based tool kits become increasingly inefficient when analyzing massive genetic variation data. While emerging distributed storage technology such as Apache Kudu offers attractive solution, it is demanded to develop distributed storage tool kit for VCF dataset. In this article, we present Variant-Kudu, an efficient genome tool kit for storing and analyzing massive genetic variation datasets. Based on a new distributed scheme, the genetic variation data would be segmented and stored in Kudu on multinode. With this scheme, data can be randomly accessed at low latency and scanned efficiently. Aiming at reducing the queries' execution time, a strategy of distributed bitmap index is proposed and a parallel query method is designed, which expedite analyses of massive genetic variation data. Variant-Kudu is a scalable tool kit to analyze massive genetic variation datasets, and our experiments demonstrate that Variant-Kudu achieves high performance on a multinode cluster.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Variant-Kudu: An Efficient Tool kit Leveraging Distributed Bitmap Index for Analysis of Massive Genetic Variation Datasets.

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology

Lead the way for us

Journal: Journal of computational biology : a journal of computational molecular cell biology	Publication Date: Jan 6, 2020
Citations: 1

Similar Papers

Analysis-ready VCF at Biobank scale using Zarr.
Eric Czech ... Jerome Kelleher
bioRxiv : the preprint server for biology | VOL. -
Eric Czech, et. al.Eric Czech ... Jerome Kelleher
12 Jun 2024
bioRxiv : the preprint server for biology | VOL. -

Improved VCF normalization for accurate VCF comparison.
Arash Bayat ... Sri Parameswaran
Bioinformatics (Oxford, England) | VOL. 33
Arash Bayat, et. al.Arash Bayat ... Sri Parameswaran
30 Dec 2016
Bioinformatics (Oxford, England) | VOL. 33

Abstract 2587: VCF2CNA: a tool for efficiently detecting copy number alteration using VCF genotype data
Daniel K Putnam ... Jinghui Zhang
Cancer Research | VOL. 77
Daniel K Putnam, et. al.Daniel K Putnam ... Jinghui Zhang
01 Jul 2017
Abstract 2587: VCF2CNA: a tool for efficiently detecting copy number alteration using VCF genotype data
Daniel K Putnam ... Jinghui Zhang

Pattern analysis of genetics and genomics: a survey of the state-of-art
Jyotismita Chaki ... Nilanjan Dey
Multimedia Tools and Applications | VOL. 79
Jyotismita Chaki, et. al.Jyotismita Chaki ... Nilanjan Dey
18 Jan 2019
Multimedia Tools and Applications | VOL. 79

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variant-Kudu: An Efficient Tool kit Leveraging Distributed Bitmap Index for Analysis of Massive Genetic Variation Datasets.

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology