Ultrafast and scalable variant annotation and prioritization with big functional genomics data.

Dandan Huang,Chenghao Xuan,Junwen Wang,Hang Xu,Miaoxin Li,Panwen Wang,Pak Chung Sham,Kai Wang,Hoi Shan Kwan,Jianhua Wang,Lei Shi,Wenyan Nong,Weidong Li,Yao Zhou,Xianfu Yi,Hongcheng Yao,Shijie Zhang,Mulin Jun Li

doi:10.1101/gr.267997.120

Abstract

The advances of large-scale genomics studies have enabled compilation of cell type–specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Research	Publication Date: Oct 15, 2020
Citations: 19	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Ultrafast and scalable variant annotation and prioritization with big functional genomics data.

Abstract

Talk to us

Similar Papers

More From: Genome Research

Lead the way for us

Similar Papers

Software architecture for adaptive in silico knowledge discovery and decision making based on big genomic data analytics
Veska Gancheva ... Ivailo Georgiev
-
Veska Gancheva, et. al.Veska Gancheva ... Ivailo Georgiev
01 Jan 2019
01 Jan 2019

ParSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.
Matteo Re ... Giuliano Grossi
GigaScience | VOL. 9
Matteo Re, et. al.Matteo Re ... Giuliano Grossi
01 May 2020
GigaScience | VOL. 9

The cancer multiple: Producing and translating genomic big data into oncology care
Tiên-Dung Hà ... Peter A Chow-White
Big Data & Society | VOL. 8
Tiên-Dung Hà, et. al.Tiên-Dung Hà ... Peter A Chow-White
01 Jan 2020
Big Data & Society | VOL. 8

SeqVItA: Sequence Variant Identification and Annotation Platform for Next Generation Sequencing Data.
Prashanthi Dharanipragada ... Sampreeth Reddy Seelam
Frontiers in Genetics | VOL. 9
Prashanthi Dharanipragada, et. al.Prashanthi Dharanipragada ... Sampreeth Reddy Seelam
14 Nov 2018
Frontiers in Genetics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ultrafast and scalable variant annotation and prioritization with big functional genomics data.

Abstract

Talk to us

Similar Papers

More From: Genome Research