Abstract

FM-index is a compact data structure suitable for fast matches of short reads to large reference genomes. The matching algorithm using this index exhibits irregular memory access patterns that cause frequent cache misses, resulting in a memory bound problem. This paper analyzes different FM-index versions presented in the literature, focusing on those computing aspects related to the data access. As a result of the analysis, we propose a new organization of FM-index that minimizes the demand for memory bandwidth, allowing a great improvement of performance on processors with high-bandwidth memory, such as the second-generation Intel Xeon Phi (Knights Landing, or KNL), integrating ultra high-bandwidth stacked memory technology. As the roofline model shows, our implementation reaches 95 percent of the peak random access bandwidth limit when executed on the KNL and almost all of the available bandwidth when executed on other Intel Xeon architectures with conventional DDR memory. In addition, the obtained throughput in KNL is much higher than the results reported for GPUs in the literature.

Highlights

  • T HE high demand for fast and low-cost genomic sequencing has pushed onward the rapid development of next-generation sequencing (NGS) technologies

  • We have evaluated our proposal on a system with an Intel Xeon Phi 7210 processor [10], [11] that includes 64 cores and 16 GiB of stacked 3D MCDRAM integrated on package

  • Our optimized data structure packs all relevant data needed in a query step within a single cache block, minimizing the memory bandwidth demand

Read more

Summary

INTRODUCTION

T HE high demand for fast and low-cost genomic sequencing has pushed onward the rapid development of next-generation sequencing (NGS) technologies. Sequence aligners based on that index include support for inexact matching built on exact alignments, that causes the memory pattern to be even less predictable. These data access patterns cause a high cache miss rate on typical cache hierarchies of multicore processors. A new organization of the FM-index data structure layout and codification is proposed, which reduces the required traffic between memory and processor cores for the exact search process. An optimized exact matching algorithm has been implemented based on the proposed FM-index, exploiting the ultra high-bandwidth memory modules integrated in the KNL processor. Our results show that performance on KNL reaches 95% of the peak random access bandwidth limit, outperforming other CPU and GPU solutions reported in the literature

FM-index
Suffix Array
FM-index data structure
Rank Query Implementations
Exact Matching Using FM-Index
Full FM-index
Sampled FM-index
K-step Sampled FM-index
Memory footprint
Memory access pattern
Search intensity
Throughput
Optimizing Throughput
Approach
Description
Search Intensity and Throughput
THROUGHPUT BOUNDS ANALYSIS
Instruction count
Random Memory Access Benchmark
Throughput Bounds
Experimental Setup and Methodology
Roofline Model
Comparison with Other Implementations
RELATED WORK
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call