Trowel: a fast and accurate error correction module for Illumina sequencing reads

Eun-Cheon Lim,Jörg Hagmann,Jonas Müller,Sang-Tae Kim,Detlef Weigel,Stefan R Henz

doi:10.1093/bioinformatics/btu513

Abstract

The ability to accurately read the order of nucleotides in DNA and RNA is fundamental for modern biology. Errors in next-generation sequencing can lead to many artifacts, from erroneous genome assemblies to mistaken inferences about RNA editing. Uneven coverage in datasets also contributes to false corrections. We introduce Trowel, a massively parallelized and highly efficient error correction module for Illumina read data. Trowel both corrects erroneous base calls and boosts base qualities based on the k-mer spectrum. With high-quality k-mers and relevant base information, Trowel achieves high accuracy for different short read sequencing applications.The latency in the data path has been significantly reduced because of efficient data access and data structures. In performance evaluations, Trowel was highly competitive with other tools regardless of coverage, genome size read length and fragment size. Trowel is written in C++ and is provided under the General Public License v3.0 (GPLv3). It is available at http://trowel-ec.sourceforge.net. euncheon.lim@tue.mpg.de or weigel@tue.mpg.de Supplementary data are available at Bioinformatics online.

Full Text