Abstract

Genome Sequence assembly is a very compute intensive problem in the field of bio-informatics. Many parallel algorithms has been proposed to accelerate this on multicores as well as clusters of machines. In recent times, the improved computation power of GPUs has enabled applications from various research fields to take advantage of the massive number of cores available in GPUs and multiple GPUs working together in parallel. In this paper we present the design and development of a Multi-GPU based assembler for sequence assembly using Nvidia's GPUs. We use the String Graph approach to circumvent the limited memory available in GPUs to build a parallel solution given that the string graph is a memory efficient data structure. Our assembler (GAMS) takes in a file of reads in fasta format produced by the current NGS technologies to build the string graph. Contigs are formed by grouping the regions of graph which can be unambiguously connected. In this paper, we also present parallel algorithms for string graph construction and graph simplification. We have benchmarked our assembler on five bacterial genomes and chr22 of the human genome. Our results show that the design on the Multi-GPU provides a 6-7x speedup over a state of the art parallel Velvet implementation. The quality of assembly produced is also significantly better.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call