Parallel Accelerating Ultra-Long Read Alignment by Vertical Partitioning Data

Deng Pan,Danyang Chen,Feng Yang,Jinxiong Zhang,Cheng Zhong

doi:10.1109/paap56126.2022.10010526

Abstract

The alignment between sequencing reads and genome is a basic work in biological big data analysis. Each read of the third generation sequencing data is getting longer, and the data size is getting larger. To effectively solve the ultra-long read alignment problem with high requirements for computing and memory capacity, a strategy for vertical partitioning ultra-long reads on hybrid CPU/GPU cluster is proposed, and a heap data structure is used to filter the local aligned results in all computing nodes of the parallel cluster system according to the alignment score to reduce the data transmission size. The methods for early termination and parallel merging-splicing are used to accelerate splicing local aligned results. The local aligned results among all computing nodes are collected and extended to obtain the final alignment results. The experimental results on datasets of simulated and real ultra-long reads show that the proposed parallel alignment algorithm can obtain high alignment accuracy, sensitivity and base-level sensitivity as a whole, and accelerate completing alignment between ultra-long reads and reference genome.

Full Text