Genotype imputation using the Positional Burrows Wheeler Transform.

Simone Rubinacci,Jonathan Marchini,Olivier Delaneau

doi:10.1371/journal.pgen.1009049

Simone Rubinacci, Jonathan Marchini + Show 1 more

Open Access

https://doi.org/10.1371/journal.pgen.1009049

Copy DOI

Abstract

Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.

Highlights

Genotype imputation is a widely used method in human genetic studies that infers unobserved genotypes in a sample of individuals
Reference panels are continuing to grow in size and this improves accuracy of the predictions, methods need to be able to scale this increased size
This data is combined with a reference panel of haplotypes with many tens of millions of markers, and a statistical model is used to predict the genotypes at these markers in the study samples [1]

Summary

Introduction

Genotype imputation is a widely used method in human genetic studies that infers unobserved genotypes in a sample of individuals. The study samples are genotyped on a SNP microarray with between 300,000 to 5 million markers. This data is combined with a reference panel of haplotypes with many tens of millions of markers, and a statistical model is used to predict the genotypes at these markers in the study samples [1]. Imputed datasets increase the number of markers that can be tested for association. In the UK Biobank dataset [2] imputation increased the number of testable markers from 825,927 to over 96 million. This increased number of SNPs can boost the power of the study. Imputation can be used to predict markers necessary to calculate polygenic risks scores (PRSs), which typically involve a weighted sum of genotypes across the genome

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS genetics	Publication Date: Nov 16, 2020
Citations: 112	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Genotype imputation using the Positional Burrows Wheeler Transform.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS genetics

Lead the way for us

Similar Papers

A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.
Kaname Kojima ... Fumiki Katsuoka
PLOS Computational Biology | VOL. 16
Kaname Kojima, et. al.Kaname Kojima ... Fumiki Katsuoka
01 Oct 2020
PLOS Computational Biology | VOL. 16

A One-Penny Imputed Genome from Next-Generation Reference Panels
Brian L Browning ... Sharon R Browning
The American Journal of Human Genetics | VOL. 103
Brian L Browning, et. al.Brian L Browning ... Sharon R Browning
09 Aug 2018
The American Journal of Human Genetics | VOL. 103

Systematic comparison of genotype imputation strategies in aquaculture: A case study in Nile tilapia (Oreochromis niloticus) populations
Shaopan Ye ... Hongyu Ma
Aquaculture | VOL. 592
Shaopan Ye, et. al.Shaopan Ye ... Hongyu Ma
06 Jun 2024
Aquaculture | VOL. 592

A multi-breed reference panel and additional rare variants maximize imputation accuracy in cattle
Troy N Rowan ... Jared E Decker
Genetics Selection Evolution | VOL. 51
Troy N Rowan, et. al.Troy N Rowan ... Jared E Decker
01 Dec 2019
Genetics Selection Evolution | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Genotype imputation using the Positional Burrows Wheeler Transform.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS genetics