Abstract

We report a novel algorithm, iBLUP, to impute missing genotypes by simultaneously and comprehensively using identity by descent and linkage disequilibrium information. The simulation studies showed that the algorithm exhibited drastically tolerance to high missing rate, especially for rare variants than other common imputation methods, e.g. BEAGLE and fastPHASE. At a missing rate of 70%, the accuracy of BEAGLE and fastPHASE dropped to 0.82 and 0.74 respectively while iBLUP retained an accuracy of 0.95. For minor allele, the accuracy of BEAGLE and fastPHASE decreased to −0.1 and 0.03, while iBLUP still had an accuracy of 0.61.We implemented the algorithm in a publicly available software package also named iBLUP. The application of iBLUP for processing real sequencing data in an outbred pig population was demonstrated.

Highlights

  • Benefited from the advances of sequencing technologies, Genome-Wide Association Studies (GWAS) have revealed substantial genetic loci controlling human diseases and agriculturally important traits [1,2,3]

  • To take full advantage of a multivariate mixed model (M-MM) to fully incorporate both linkage disequilibrium (LD) and identity by decent (IBD) simultaneously, we made two major changes to enhance the representations of marker IBD information on the relationship matrix (K) among individuals, and marker LD information on the covariance matrix (G) of underlying variables (See Figure 1)

  • Missing genotype imputation is a critical process between sequencing and utilization for GWAS and genomic prediction [29,30,31]

Read more

Summary

Introduction

Benefited from the advances of sequencing technologies, Genome-Wide Association Studies (GWAS) have revealed substantial genetic loci controlling human diseases and agriculturally important traits [1,2,3]. The identified loci collectively explain only a small proportion of total variation [4,5,6,7]. Multiplexing is one the advances that revolutionized the high throughput Genotyping By Sequencing (GBS). Samples are individually tagged and pooled into a single lane of flow cell. It exponentially increases the number of samples analyzed in a single run without dramatically increasing cost and time [9]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call