Assessment of the performance of hidden Markov models for imputation in animal breeding

Andrew Whalen,Gregor Gorjanc,Roger Ros-Freixedes,John M Hickey

doi:10.1186/s12711-018-0416-8

Andrew Whalen, Gregor Gorjanc + Show 2 more

Open Access

https://doi.org/10.1186/s12711-018-0416-8

Copy DOI

Abstract

BackgroundIn this paper, we review the performance of various hidden Markov model-based imputation methods in animal breeding populations. Traditionally, pedigree and heuristic-based imputation methods have been used for imputation in large animal populations due to their computational efficiency, scalability, and accuracy. Recent advances in the area of human genetics have increased the ability of probabilistic hidden Markov model methods to perform accurate phasing and imputation in large populations. These advances may enable these methods to be useful for routine use in large animal populations, particularly in populations where pedigree information is not readily available.MethodsTo test the performance of hidden Markov model-based imputation, we evaluated the accuracy and computational cost of several methods in a series of simulated populations and a real animal population without using a pedigree. First, we tested single-step (diploid) imputation, which performs both phasing and imputation. Second, we tested pre-phasing followed by haploid imputation. Overall, we used four available diploid imputation methods (fastPHASE, Beagle v4.0, IMPUTE2, and MaCH), three phasing methods, (SHAPEIT2, HAPI-UR, and Eagle2), and three haploid imputation methods (IMPUTE2, Beagle v4.1, and Minimac3).ResultsWe found that performing pre-phasing and haploid imputation was faster and more accurate than diploid imputation. In particular, among all the methods tested, pre-phasing with Eagle2 or HAPI-UR and imputing with Minimac3 or IMPUTE2 gave the highest accuracies with both simulated and real data.ConclusionsThe results of this study suggest that hidden Markov model-based imputation algorithms are an accurate and computationally feasible approach for performing imputation without a pedigree when pre-phasing and haploid imputation are used. Of the algorithms tested, the combination of Eagle2 and Minimac3 gave the highest accuracy across the simulated and real datasets.

Highlights

In this paper, we review the performance of various hidden Markov model-based imputation methods in animal breeding populations
In this paper, we review and analyse the use of imputation methods based on hidden Markov models (HMM) for animal breeding populations
We evaluated the performance of the four diploid imputation methods, fastPHASE, Beagle v4.0, IMPUTE2, and MaCH and the three phasing methods, SHAPEIT2, HAPI-UR, and Eagle2 followed by three haploid imputation methods, IMPUTE2, Beagle v4.1, and Minimac3 on a series of simulated datasets and a real dataset

Summary

Introduction

We review the performance of various hidden Markov model-based imputation methods in animal breeding populations. Recent advances in the area of human genetics have increased the ability of probabilistic hidden Markov model methods to perform accurate phasing and imputation in large populations. These advances may enable these methods to be use‐ ful for routine use in large animal populations, in populations where pedigree information is not readily available. We review and analyse the use of imputation methods based on hidden Markov models (HMM) for animal breeding populations. The larger number of genotyped animals increases the accuracy of genetic predictions [6] and/or offers the potential to increase selection intensity [7, 8]

Methods

Results

Discussion

Conclusion