High-performance genomic analysis framework with in-memory computing

Xueqi Li,Guangming Tan,Bingchen Wang,Ninghui Sun

doi:10.1145/3200691.3178511

Abstract

In this paper, we propose an in-memory computing framework (called GPF) that provides a set of genomic formats, APIs and a fast genomic engine for large-scale genomic data processing. Our GPF comprises two main components: (1) scalable genomic data formats and API. (2) an advanced execution engine that supports efficient compression of genomic data and eliminates redundancies in the execution engine of our GPF. We further present both system and algorithm-specific implementations for users to build genomic analysis pipeline without any acquaintance of Spark parallel programming. To test the performance of GPF, we built a WGS pipeline on top of our GPF as a test case. Our experimental data indicate that GPF completes Whole-Genome-Sequencing (WGS) analysis of 146.9G bases Human Platinum Genome in running time of 24 minutes, with over 50% parallel efficiency when used on 2048 CPU cores. Together, our GPF framework provides a fast and general engine for large-scale genomic data processing which supports in-memory computing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

High-performance genomic analysis framework with in-memory computing

Abstract

Talk to us

Similar Papers

More From: ACM SIGPLAN Notices

Lead the way for us

Journal: ACM SIGPLAN Notices	Publication Date: Feb 10, 2018
Citations: 3

Similar Papers

High-performance genomic analysis framework with in-memory computing
Xueqi Li ... Ninghui Sun
-
Xueqi Li, et. al.Xueqi Li ... Ninghui Sun
10 Feb 2018
10 Feb 2018

The Need for Speed and Energy Efficiency in Genome Analysis
Sachin Rawat
GEN Biotechnology | VOL. 2
Sachin RawatSachin Rawat
01 Jun 2023
GEN Biotechnology | VOL. 2

A grid-based system for microbial genome comparison and analysis
Y Sun ... P Watson
-
Y Sun, et. al.Y Sun ... P Watson
01 Jan 2004
01 Jan 2004

Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processing
Jianting Zhang ... Simin You
-
Jianting Zhang, et. al.Jianting Zhang ... Simin You
01 Jun 2015
01 Jun 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-performance genomic analysis framework with in-memory computing

Abstract

Talk to us

Similar Papers

More From: ACM SIGPLAN Notices