Improving population scale statistical phasing with whole-genome sequencing data.

Rick Wertenbroek,Robin J Hofmeister,Ioannis Xenarios,Yann Thoma,Olivier Delaneau

doi:10.1371/journal.pgen.1011092

Abstract

Haplotype estimation, or phasing, has gained significant traction in large-scale projects due to its valuable contributions to population genetics, variant analysis, and the creation of reference panels for imputation and phasing of new samples. To scale with the growing number of samples, haplotype estimation methods designed for population scale rely on highly optimized statistical models to phase genotype data, and usually ignore read-level information. Statistical methods excel in resolving common variants, however, they still struggle at rare variants due to the lack of statistical information. In this study we introduce SAPPHIRE, a new method that leverages whole-genome sequencing data to enhance the precision of haplotype calls produced by statistical phasing. SAPPHIRE achieves this by refining haplotype estimates through the realignment of sequencing reads, particularly targeting low-confidence phase calls. Our findings demonstrate that SAPPHIRE significantly enhances the accuracy of haplotypes obtained from state of the art methods and also provides the subset of phase calls that are validated by sequencing reads. Finally, we show that our method scales to large data sets by its successful application to the extensive 3.6 Petabytes of sequencing data of the last UK Biobank 200,031 sample release.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving population scale statistical phasing with whole-genome sequencing data.

Abstract

Talk to us

Similar Papers

More From: PLoS genetics

Lead the way for us

Journal: PLoS genetics	Publication Date: Jul 3, 2024
License type: CC BY 4.0

Similar Papers

A naturally occurring mitochondrial genome variant confers broad protection from infection in Drosophila
Tiina S Salminen ... Pedro F Vale
PLOS Genetics | VOL. 20
Tiina S Salminen, et. al.Tiina S Salminen ... Pedro F Vale
11 Nov 2024
PLOS Genetics | VOL. 20

Prediction of causal genes at GWAS loci with pleiotropic gene regulatory effects using sets of correlated instrumental variables
Mariyam Khan ... Tom Michoel
PLOS Genetics | VOL. 20
Mariyam Khan, et. al.Mariyam Khan ... Tom Michoel
11 Nov 2024
PLOS Genetics | VOL. 20

Glucocerebrosidase deficiency leads to neuropathology via cellular immune activation
Evelyn S Vincow ... Leo J Pallanck
PLOS Genetics | VOL. 20
Evelyn S Vincow, et. al.Evelyn S Vincow ... Leo J Pallanck
11 Nov 2024
PLOS Genetics | VOL. 20

Repeat mediated excision of gene drive elements for restoring wild-type populations
Pratima R Chennuri ... Kevin M Myles
PLOS Genetics | VOL. -
Pratima R Chennuri, et. al.Pratima R Chennuri ... Kevin M Myles
07 Nov 2024
PLOS Genetics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving population scale statistical phasing with whole-genome sequencing data.

Abstract

Talk to us

Similar Papers

More From: PLoS genetics