HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads

Stefano Beretta,Murray D Patterson,Paola Bonizzoni,Simone Zaccaria,Gianluca Della Vedova

doi:10.1186/s12859-018-2253-8

Stefano Beretta, Murray D Patterson + Show 3 more

Open Access

https://doi.org/10.1186/s12859-018-2253-8

Copy DOI

Abstract

BackgroundHaplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Long reads, which are nowadays cheaper to produce and more widely available than ever before, have been used to reduce the fragmentation of the assembled haplotypes since their ability to span several variants along the genome. These long reads are also characterized by a high error rate, an issue which may be mitigated, however, with larger sets of reads, when this error rate is uniform across genome positions. Unfortunately, current state-of-the-art dynamic programming approaches designed for long reads deal only with limited coverages.ResultsHere, we propose a new method for assembling haplotypes which combines and extends the features of previous approaches to deal with long reads and higher coverages. In particular, our algorithm is able to dynamically adapt the estimated number of errors at each variant site, while minimizing the total number of error corrections necessary for finding a feasible solution. This allows our method to significantly reduce the required computational resources, allowing to consider datasets composed of higher coverages. The algorithm has been implemented in a freely available tool, HapCHAT: Haplotype Assembly Coverage Handling by Adapting Thresholds. An experimental analysis on sequencing reads with up to 60 × coverage reveals improvements in accuracy and recall achieved by considering a higher coverage with lower runtimes.ConclusionsOur method leverages the long-range information of sequencing reads that allows to obtain assembled haplotypes fragmented in a lower number of unphased haplotype blocks. At the same time, our method is also able to deal with higher coverages to better correct the errors in the original reads and to obtain more accurate haplotypes as a result.AvailabilityHapCHAT is available at http://hapchat.algolab.euunder the GNU Public License (GPL).

Highlights

Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual
Availability: Haplotype Assembly Coverage Handling by Adapting (HapCHAT) is available at http://hapchat.algolab.eu under the GNU’s Not Unix (GNU) Public License (GPL)
We report the results obtained by running the tools with maximum coverage 30× for HapCHAT, 25× for HapCol, 15× and 20× for WhatsHap

Summary

Introduction

Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Due to the availability of curated, high quality haplotype reference panels on a large population of individuals [8, 9], computational methods for statistically inferring the haplotypes of an individual from these panels are widely used [1, 10] The accuracy of these methods, depends heavily on the size and diversity of the population used to compile the panels, entailing poor performance on rare variants, while de novo variants are completely missed. These types of variants appear in the sequencing reads of the individual, making read-based haplotype assembly the obvious solution

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 3, 2018
Citations: 8	License type: open-access

R Discovery Prime

R Discovery Prime

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Interaction of SO 2 with iron deposited on CaO(100)
Ranjani V Siriwardane ... Jason M Cook
Journal of Colloid And Interface Science | VOL. 116
Ranjani V Siriwardane, et. al.Ranjani V Siriwardane ... Jason M Cook
01 Mar 1987
Interaction of SO 2 with iron deposited on CaO(100)
Ranjani V Siriwardane ... Jason M Cook

HapCol: accurate and memory-efficient haplotype assembly from long reads.
Yuri Pirola ... Riccardo Dondi
Bioinformatics (Oxford, England) | VOL. 32
Yuri Pirola, et. al.Yuri Pirola ... Riccardo Dondi
26 Aug 2015
Bioinformatics (Oxford, England) | VOL. 32

Hap10: reconstructing accurate and long polyploid haplotypes using linked reads
Sina Majidian ... Mohammad Hossein Kahaei
BMC Bioinformatics | VOL. 21
Sina Majidian, et. al.Sina Majidian ... Mohammad Hossein Kahaei
18 Jun 2020
BMC Bioinformatics | VOL. 21

Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection
Andrew Stubbs ... Stephan Nouwens
Journal of Clinical Bioinformatics | VOL. 2
Andrew Stubbs, et. al.Andrew Stubbs ... Stephan Nouwens
01 Jan 2012
Journal of Clinical Bioinformatics | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics