Abstract

Haplotype assembly from high-throughput sequencing data is a computationally challenging problem. In fact, most of its formulations, including the most widely used one that relies on optimizing the minimum error correction criterion, are known to be NP-hard. Since finding exact solutions to haplotype assembly problems is difficult, suboptimal heuristics are often used. In this paper, we propose a novel method for optimal haplotype assembly that is based on depth-first branch-and-bound search of the solution space. Drawing on ideas from sphere decoding algorithms in digital communications, we exploit statistical information about errors in sequencing data to constrain the search of the haplotype space and thus efficiently find the optimal solution. Theoretical analysis and extensive simulation studies, as well as benchmarking on 1000 Genomes Project experimental data, demonstrate efficacy of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call