A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels

Pierre Faux,Tom Druet

doi:10.1186/s12711-017-0321-6

Pierre Faux, Tom Druet

Open Access

https://doi.org/10.1186/s12711-017-0321-6

Copy DOI

Journal: Genetics Selection Evolution	Publication Date: May 16, 2017
Citations: 5	License type: open-access

Affiliation: University of Liège

Abstract

BackgroundHaplotype reconstruction (phasing) is an essential step in many applications, including imputation and genomic selection. The best phasing methods rely on both familial and linkage disequilibrium (LD) information. With whole-genome sequence (WGS) data, relatively small samples of reference individuals are generally sequenced due to prohibitive sequencing costs, thus only a limited amount of familial information is available. However, reference individuals have many relatives that have been genotyped (at lower density). The goal of our study was to improve phasing of WGS data by integrating familial information from haplotypes that were obtained from a larger genotyped dataset and to quantify its impact on imputation accuracy.ResultsAligning a pre-phased WGS panel [~5 million single nucleotide polymorphisms (SNPs)], which is based on LD information only, to a 50k SNP array that is phased with both LD and familial information (called scaffold) resulted in correctly assigning parental origin for 99.62% of the WGS SNPs, their phase being determined unambiguously based on parental genotypes. Without using the 50k haplotypes as scaffold, that value dropped as expected to 50%. Correctly phased segments were on average longer after alignment to the genotype phase while the number of switches decreased slightly. Most of the incorrectly assigned segments, and subsequent switches, were due to singleton errors. Imputation from 50k SNP array to WGS data with improved phasing had a marginal impact on imputation accuracy (measured as r2), i.e. on average, 90.47% with traditional techniques versus 90.65% with pre-phasing integrating familial information. Differences were larger for SNPs located in chromosome ends and rare variants. Using a denser WGS panel (~13 millions SNPs) that was obtained with traditional variant filtering rules, we found similar results although performances of both phasing and imputation accuracy were lower.ConclusionsWe present a phasing strategy for WGS data, which indirectly integrates familial information by aligning WGS haplotypes that are pre-phased with LD information only on haplotypes obtained with genotyping data, with both LD and familial information and on a much larger population. This strategy results in very few mismatches with the phase obtained by Mendelian segregation rules. Finally, we propose a strategy to further improve phasing accuracy based on haplotype clusters obtained with genotyping data.

Highlights

Haplotype reconstruction is an essential step in many applications, including imputation and genomic selection
The results indicate that phasing with linkage disequilibrium (LD) information only (WGS-P1 phase) leads to random assignment of parental origin: about 50% of single nucleotide polymorphism (SNP) are not correctly phased
The distances between consecutive switches are larger for the whole-genome sequence (WGS)-P2 phase (3.19 Mb) than for the WGS-P1 phase (3.01 Mb.) We found that any WGS SNP was located, on average, at 7.8 Mb of the closest switch for the WGS-P2 phase whereas it was only at 6.7 Mb for the WGS-P1 phase (Table 5)

Summary

Introduction

Haplotype reconstruction (phasing) is an essential step in many applications, including imputation and genomic selection. The best phasing methods rely on both familial and linkage disequilibrium (LD) information. With whole-genome sequence (WGS) data, relatively small samples of reference individuals are generally sequenced due to prohibitive sequencing costs, only a limited amount of familial information is available. The goal of our study was to improve phasing of WGS data by integrating familial information from haplotypes that were obtained from a larger genotyped dataset and to quantify its impact on imputation accuracy. For each marker, the combination of marker alleles that are carried by an individual. Most haplotyping methods rely either on familial information (e.g., [14, 15]), linkage disequilibrium Note that the so-called long-range phasing (LRP) methods achieve haplotype reconstruction at long distances without requiring explicit familial information

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genetics Selection Evolution

Lead the way for us

Similar Papers

ATRIUM: Testing Untyped SNPs in Case-Control Association Studies with Related Individuals
Zuoheng Wang ... Mary Sara Mcpeek
The American Journal of Human Genetics | VOL. 85
Zuoheng Wang, et. al.Zuoheng Wang ... Mary Sara Mcpeek
01 Nov 2009
The American Journal of Human Genetics | VOL. 85

A combined linkage disequilibrium and cosegregation method for fine mapping of QTL and approaches to study the long-term accuracy of genomic selection
Wei He
-
Wei HeWei He
06 Apr 2012
06 Apr 2012

Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs
Hailiang Song ... Xiangdong Ding
Genetics, selection, evolution : GSE | VOL. 51
Hailiang Song, et. al.Hailiang Song ... Xiangdong Ding
21 Oct 2019
Genetics, selection, evolution : GSE | VOL. 51

Genomic selection in farm animals: accuracy of prediction and applications with imputed whole-genome sequencing data in chicken
Guiyan Ni
-
Guiyan NiGuiyan Ni
21 Feb 2022
21 Feb 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genetics Selection Evolution