Abstract

We present a probabilistic algorithm for error correction for high throughput DNA sequencing data. Our approach leverages our prior algorithm PREMIER where sequencer outputs are modeled as independent realizations of a Hidden Markov Model (HMM) and the problem of error correction is posed as one of maximum likelihood sequence detection over this HMM. In this work we propose an algorithm called PREMIER Turbo which can be viewed as an iterative application of the PREMIER approach. Specifically, we apply error correction in both the forward and the backward directions in a given read. We also present a heuristic inspired by turbo-equalization that incorporates the prior belief on a nucleotide position returned by the Baum-Welch algorithm into the error correction steps. Our approach significantly improves the correction of nucleotides in the beginning of the read. Our test results on the real C. elegans and E. coli datasets show that PREMIER Turbo achieves a significantly better error correction performance than the other competing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call