GPU-Accelerated Parallel Aligning Long Reads with High Error Rate Using Enhanced Sparse Suffix Array

Hao Wei,Jinxiong Zhang,Cheng Zhong,Danyang Chen,Mengxiao Yin

doi:10.1007/978-981-15-2767-8_28

Abstract

The read alignment (sequence alignment) is one of the most basic and time-consuming problems in Bioinformatics. In this paper, a CPU-GPU parallel long-read alignment method is studied to solve this problem. A lightweight data structure using enhanced sparse suffix array is used to store the index of reference genome in order to adapt to the limited memory capacity of GPU architecture. The two-dimensional search space between the reference genome and long reads is divided into several search sub-spaces. The massive long reads alignment is further divided into the multiple long-read alignments with smaller size. A CPU-GPU parallel algorithm aligning long reads with high error rate is implemented by improving the seeds selection scheme. The experimental results show that the parallel algorithm can accelerate remarkably the long-read alignment while maintaining the alignment accuracy and recall rate as a whole.

Full Text