Abstract

Abstract. The aim of this study was to investigate the effect of different strategies for handling low-quality or missing data on prediction accuracy for direct genomic values of protein yield, mastitis and fertility using a Bayesian variable model and a GBLUP model in the Danish Jersey population. The data contained 1,071 Jersey bulls that were genotyped with the Illumina Bovine 50K chip. After preliminary editing, 39,227 SNP remained in the dataset. Four methods to handle missing genotypes were: 1) BEAGLE: missing markers were imputed using Beagle 3.3 software, 2) COMMON: missing genotypes at a locus were replaced by the most common genotype at this locus observed in the marker data, 3) EX-ALLELE: missing marker genotypes at a locus were treated as an extra allele, and 4) POP-EXP: missing genotypes at a locus were replaced with population expectation at this locus. It was shown that among the methods used in this study, imputation with Beagle was the best approach to handle missing genotypes. Treating missing markers as a pseudo-allele, replacing missing markers with a population average or substituting the most common alleles each reduced the accuracy of genomic predictions. The results from this study suggest that missing genotypes should be imputed in order to improve genomic prediction. Editing the marker data with stringent threshold on GenCall (GC) scores and then imputing the discarded genotypes did not lead to higher accuracy. All marker genotypes with a GC score over 0.15 should be retained for genomic prediction.

Highlights

  • A number of factors determine the benefit from genomic selection (Meuwissen et al 2001)

  • 1) BEAGLE: missing markers were imputed using Beagle 3.3 (Browning & Browning 2007) with default settings, 2) COMMON: missing genotypes at a locus were replaced by the most common genotype at this locus observed in the marker data, 3) EXALLELE: missing marker genotypes at a locus were treated as an extra allele and 4) POP-EXP: missing genotypes at a locus were replaced with population expectation at this locus

  • Dealing with missing marker genotypes using Beagle or COMMON led to slightly higher accuracies of direct genomic breeding value (DGV) than EX-ALLELE in all three traits

Read more

Summary

Introduction

A number of factors determine the benefit from genomic selection (Meuwissen et al 2001). The accuracies of genomic predictions depend on many factors such as: reference population size (Hayes et al 2009a), heritability of the traits (Goddard 2009, Hayes et al 2009a, Su et al 2010), marker density (Moser et al 2010), effective population size (Goddard 2009) and relatedness between reference and validation population (Habier et al 2010) Another factor is the quality of the available marker information. Some of the available marker information can be wrong (e.g. due to genotyping error)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call