HiGene: A high-performance platform for genomic data analysis

Liqun Deng Liqun Deng,Youliang Yan Youliang Yan,Guowei Huang Guowei Huang,Jiansheng Wei Jiansheng Wei,Yuzheng Zhuang Yuzheng Zhuang

doi:10.1109/bibm.2016.7822584

Abstract

Post-sequencing genomic data analysis becomes a major challenge while next-generation sequencing technologies evolve by leaps and bounds. The data-intensive and compute-intensive nature of genome analysis makes cluster computing an attractive choice for building efficient solutions. This paper presents HiGene, a high-performance genome analysis platform that exploits big data technology to revolutionize genomics data crunching power. HiGene reconstructs the genome analysis pipeline by exploiting both multi-core and multi-node parallelization using Apache Spark, and employs two key techniques to further boost the performance. First, a dynamic computing resource re-allocator is implemented, which allows flexible on-demand resource allocation for operations inside tasks. Second, an efficient skew mitigation approach is proposed, which automatically identifies and resolves data skew and computation skew through task repartitioning and resource reallocating respectively. HiGene has been evaluated with a whole human genome dataset on a 10-node Huawei 5885 cluster. Experimental results show that HiGene achieves remarkable high performance that reduces the total running time on a whole genome sequence dataset from days to nearly one hour. Furthermore, it is two times faster than state-of-the-art cluster based approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HiGene: A high-performance platform for genomic data analysis

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

OpenCGA: a scalable and high-performance platform for big data analysis and visualisation in genomics
...
-
, et. al. ...
05 Sep 2020
05 Sep 2020

CloudGT: A High Performance Genome Analysis Toolkit Leveraging Pipeline Optimization on Spark
Anghong Xiao ... Zongze Wu
-
Anghong Xiao, et. al.Anghong Xiao ... Zongze Wu
01 Dec 2018
01 Dec 2018

Scalable Pathogen Pipeline Platform (SP^3): Enabling Unified Genomic Data Analysis with Elastic Cloud Computing
Fan Yang-Turner ... Tim Peto
-
Fan Yang-Turner, et. al.Fan Yang-Turner ... Tim Peto
01 Jul 2019
01 Jul 2019

HelicoBase: a Helicobacter genomic resource and analysis platform.
Siew Choo ... Wei Wee
BMC Genomics | VOL. 15
Siew Choo, et. al.Siew Choo ... Wei Wee
01 Jan 2014
BMC Genomics | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HiGene: A high-performance platform for genomic data analysis

Abstract

Talk to us

Similar Papers