Abstract

Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between the input reads and the reference genome, where hash tables are the most frequently used data structure. However, hash tables are memory-consuming, making it not well-suited to memory-stringent many-core architectures, like GPUs, even though they usually have a nearly constant query time complexity. The Burrows-Wheeler transform (BWT) provides a memory-efficient alternative, which has the drawback of having query time complexity as a function of query length. In this paper, we investigate GPU-based fixed-length seed generation for computational genomics based on the BWT and Ferragina Manzini (FM)-index, where k-mers from the reads are searched against a reference genome (indexed using BWT) to find k-mer matches (i.e. seeds). In addition to exact matches, mismatches are allowed at any position within a seed, different from spaced seeds that only allow mismatches at predefined positions. By evaluating the relative performance of our GPU version to an equivalent CPU version, we intend to provide some useful guidance for the development of GPU-based seed generators for aligners based on the seed-and-extend paradigm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.