Abstract
Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between the input reads and the reference genome, where hash tables are the most frequently used data structure. However, hash tables are memory-consuming, making it not well-suited to memory-stringent many-core architectures, like GPUs, even though they usually have a nearly constant query time complexity. The Burrows-Wheeler transform (BWT) provides a memory-efficient alternative, which has the drawback of having query time complexity as a function of query length. In this paper, we investigate GPU-based fixed-length seed generation for computational genomics based on the BWT and Ferragina Manzini (FM)-index, where k-mers from the reads are searched against a reference genome (indexed using BWT) to find k-mer matches (i.e. seeds). In addition to exact matches, mismatches are allowed at any position within a seed, different from spaced seeds that only allow mismatches at predefined positions. By evaluating the relative performance of our GPU version to an equivalent CPU version, we intend to provide some useful guidance for the development of GPU-based seed generators for aligners based on the seed-and-extend paradigm.
Paper version not known (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have