Abstract

High-throughput next generation sequencers (NGS) can rapidly read billions of short DNA fragments, called reads, at low cost. Moreover, their throughput is increasing and cost is decreasing at rates much faster than the Moore's law. This demands commensurate acceleration for NGS secondary analysis that process the reads to identify variations between genomes. Conventional architectural improvements can at best improve performance at the rate of Moore's law even if the software tools efficiently utilize the underlying architecture. Unfortunately, most of the dozens of software products developed for this purpose fail to exploit the underlying architecture well. Therefore, to match the pace of development of the sequencers, we will need architecture that is more tailored for the computational requirements of NGS secondary analysis as well as software that uses the architecture optimally. To this end, in this work, we study the performance characteristics of NGS secondary analysis and investigate the suitability of modern Intel Xeon and Xeon Phi processors for the same. To keep the study manageable, we rely on recent studies that attribute a majority of the run-time to a few key kernels. We present detailed optimization efforts to accelerate these kernels on the latest Intel Xeon and Xeon Phi processors with the goal of extracting maximum performance. A comparison of our optimized implementations, along with published results on GPGPU implementations, shows that our optimized implementations on the Skylake processors yield highest performance. We also present an in-depth analysis of the key kernels and identify their performance characteristics and bottlenecks to inform future architectural designs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call