Abstract

One of the most important application in bioinformatics is read mapping. With the rapidly increasing number of reads produced by next-generation sequencing (NGS) technology, there is a need for fast and efficient high-throughput read mappers. In this paper, we present FMapper – a highly scalable read mapper on the TaihuLight supercomputer optimized for its fourth-generation ShenWei many-core architecture (SW26010). In order to fully exploit the computational power of the SW26010, we employ dynamic scheduling of tasks, asynchronous I/O and data transfers and implement a vectorized version of the banded Myers algorithm tailored to the 256 bit vector registers of the SW26010. Our performance evaluation demonstrates that FMapper using all 4 compute groups of a single SW26010 processor outperforms S-Aligner on the same hardware as well as RazerS3, Hobbes3, Minimap2 and BWA running on a 4-core Xeon W-2123v3 CPU and achieves speedups of 4.7, 24.8, 2.4, 4.6 and 14.7 respectively. Using several optimizations, we achieve a speedup of 6 compared to the naïve implementation on one compute group of an SW26010 processor and a strong scaling efficiency of 65% on 512 compute groups.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call