Abstract
BackgroundNext-generation sequencing technologies have led to the high-throughput production of sequence data (reads) at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the de novo assembly in terms of assembly quality and scalability for large-scale short read datasets.ResultsWe present DecGPU, the first parallel and distributed error correction algorithm for high-throughput short reads (HTSRs) using a hybrid combination of CUDA and MPI parallel programming models. DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation. The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale HTSR datasets. Using simulated and real datasets, our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the existing error correction algorithms. Furthermore, when combined with Velvet and ABySS, the resulting DecGPU-Velvet and DecGPU-ABySS assemblers demonstrate the potential of our algorithm to improve de novo assembly quality for de-Bruijn-graph-based assemblers.ConclusionsDecGPU is publicly available open-source software, written in CUDA C++ and MPI. The experimental results suggest that DecGPU is an effective and feasible error correction algorithm to tackle the flood of short reads produced by next-generation sequencing technologies.
Highlights
Next-generation sequencing technologies have led to the high-throughput production of sequence data at low cost
The GPU-based version takes advantage of the compute unified device architecture (CUDA) and message passing interface (MPI) parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation
We have evaluated the performance of DecGPU from three perspectives: (1) the error correction quality both on simulated and real short read datasets; (2) de novo assembly quality improvement after combining our algorithm with Velvet and ABySS; and (3) the scalability with respect to different number of compute resources for the CPU-based and GPU-based versions respectively
Summary
We have evaluated the performance of DecGPU from three perspectives: (1) the error correction quality both on simulated and real short read datasets; (2) de novo assembly quality improvement after combining our algorithm with Velvet (version 1.0.17) and ABySS (version 1.2.1); and (3) the scalability with respect to different number of compute resources for the CPU-based and GPU-based versions respectively. The execution speed of DecGPU is evaluated using the three real datasets in terms of: (1) scalability of the CPU-based and GPU-based versions with respect to different number of compute resources, and (2) execution time of the GPU-based version compared to that of CUDA-EC (version 1.0.1) on a single GPU. Both of the assessments are conducted on the already described computing cluster. Even though our algorithm does not show good parallel scalability with respect to different number of computing resources, the distributed feature of our algorithm does provide a feasible and flexible solution to the error correction of largescale HTSR datasets
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.