Genomic data and breeding value estimation in dairy cattle: theory, practice, problems.

I. Strandén,E.A. Mäntysaari

doi:10.1111/jbg.12218

Abstract

Dairy cattle evaluations have been forerunners in new innovations and ideas. Breeding value estimation models have seen revolutionary changes at each decade during their century-long history. An important framework is the mixed-model equations (MME) to solve BLUPs [Henderson (1973) Proc. Symp. in Honor of Dr Jay L. Lush, pp. 10–41]. Complex models have been implemented due to clever – and also elegant – innovations by the animal breeding community, such as simple rules to build the inverse of relationship matrix [Henderson (1976) Biometrics, 32, 69–83] and the iteration on data concept [Schaeffer and Kennedy (1986) J. Dairy Sci., 69, 575–579; Misztal (1987) J. Dairy Sci., 70, 716–723]. These innovations have been supported by a steady increase in computing power. An important reason for the success has been the ability of numerical approaches to take advantage of sparseness in the problems. By sparseness, we mean that even though the number of unknowns to be solved is now hundreds of millions, statistical models have only few unknown effects per observation and the computations can exploit the sparse coefficient matrix of MME. Genomic information is a new, fast-growing source of data for dairy cattle evaluations. Genomic information is dense, unlike the traditional data used in the evaluations. Each genotyped animal has several thousand marker genotypes. There are alternative ways to use genomic data in genetic evaluations. In the multiple step approach, genomic information is used in modelling pre-adjusted observations. The first models predicted SNP marker effects via SNP-BLUP or Bayesian approaches [Meuwissen et al. (2001) Genetics, 157, 1819–1829]. VanRaden [(2008), J. Dairy Sci., 91, 4414–4423] bridged the gap between SNP-BLUP and selection index with the genomic relationship matrix, G. The alternatives were shown to be equivalent by Strandén and Garrick [(2009) J. Dairy Sci., 92, 2971–2975]. Solutions from a simple MME where the relationship matrix A is replaced by G have become almost an industry standard. However, the multiple step approach has a crucial shortcoming: when selection is based on breeding values predicted with genomic information, the traditional evaluation cannot account for the selection and bias will be introduced. This would reduce the value of both the genetic and the genomic evaluations. A better option is to integrate genomic data into animal model evaluations. Single-step genomic BLUP (ssGBLUP) is a theoretically justified approach [Christensen and Lund (2010) Genet. Sel. Evol., 42, 2; Misztal et al. (2010) J. Dairy Sci., 93 (Suppl. 1), 533] for including the genomic information into evaluations where some animals have only pedigree information and some both the pedigree and genotypes. The marker information allows better description of covariance structure between genotyped animals. In practice, ssGBLUP requires the computation of the inverse for both the genomic relationship matrix G and the pedigree-based relationship matrix A22 for the genotyped animals. These two matrices are dense with most coefficients being nonzero, unlike the sparse inverse of the pedigree relationship matrix. In solving the traditional MME by iterative methods, computation load is linearly related to the amount of data. In solving MME of ssGBLUP, the computation task is quadratically related to the number of genotyped animals. Due to lower laboratory costs, the number of genotyped animals is now increasing rapidly and the computational challenges in ssGBLUP evaluations are escalating. No doubt, we will need new innovations when the number of genotyped animals exceeds 150 000. An alternative route to ssGBLUP has been suggested by Fernando et al. [(2014) Genet. Sel. Evol., 46, 50]. They propose to explicitly impute genomic information for animals based on their pedigree relationships to genotyped animals. Thereafter, the computations can be performed almost as if all the animals were genotyped, as with SNP-BLUP. The approach is very appealing because no genomic relationship matrix is required, and instead of breeding values, the marker effects are estimated. This circumvents some of the aforementioned computational problems of ssGBLUP. However, the Achilles' heel is the enormous storage space needed for the imputation outcomes of all non-genotyped animals. Implicit imputations during the iteration seem a better choice. In these, the actual genotypes are not assigned to non-genotyped animals in the pedigree, but instead, only a number of sparse matrix solves are needed in each round of iteration. The size of the extra equation system solved is large, that is the number of non-genotyped animals, but due to sparseness of information, it can be done effectively. However, because the extra solve is needed for each effect in the model, both fixed and random effects, it increases computation time per round of iteration, particularly in complicated multitrait models. In recent years, many of us have tried to find a more efficient approach to solve the original ssGBLUP [Legarra and Ducrocq (2012) J. Dairy Sci., 95, 4629–4645; Liu et al. (2014) J. Dairy Sci., 97, 5833–5850]. The APY algorithm [Misztal et al. (2014) J. Dairy Sci., 97, 3943–3952] is an approach that yields sparseness in the computations through a simplified G matrix. Essentially, a set of ‘core animals’ have dense exact genomic information, while young animals have genomic relationships only through the core animals. Therefore, in the inverse of GAPY, the part pertaining to young animals is diagonal. The success of the APY algorithm depends on two choices. First, an appropriate number of core animals are decided. Second, the right set of core animals is chosen. Note that when the number of animals in the core is close to the number of markers, or even higher, the approach results in a division by zero while solving the diagonals of young animals in the inverse of GAPY. Hence, the number of core animals is often much less than the number of markers, resulting in the much desired sparseness for the computations. Genetically heterogeneous multibreed populations are a major challenge in building the genomic relationship matrix, particularly with APY. When a population consists of many breeds, each breed should be represented in the APY core. As the number of markers gives the upper limit to the number of core animals, the core set may be difficult to find. In general, the typical genomic relationship matrix [VanRaden (2008) J. Dairy Sci., 91, 4414–4423] is a kind of distance matrix of individuals with respect to (base) population allele frequencies. When the population consists of several breeds, the choice of allele frequencies requires special attention [Makgahlela et al. (2013) J. Dairy Sci., 96, 5364–5375]. This is very important in ssGBLUP where the genomic and pedigree relationship matrices are combined and assumed to describe the same genetic relationships but with different sources of information. Genetic groups are often included in the evaluations. They also should be accounted for in the genomic relationship [Misztal et al. (2013) J. Anim. Breed. Genet., 130, 252–258]. We have observed that failure to properly treat the genetic groups and allele frequencies leads to poor convergence by the preconditioned conjugate gradient method and to deterioration in estimating breeding values. These are, however, only two (unsolved) questions in the use of genomic relationship matrix in ssGBLUP. For the pedigree-based relationship matrix, we have arrived in unequivocal formulation of elements, inverse elements and genetic groups, while there seems to be challenges in constructing the genomic relationship matrix. Genotyping is becoming a routine procedure, and most likely in the near future all newborn animals will be genotyped. Hence, do we even need ssGBLUP or could we just move to SNP-BLUP as was suggested by Larry Schaeffer (Is the animal model obsolete? AGDG-list; 2014). In the pedigree world, we are used to including data and animals from several generations. It is also known that marker information can only predict consistently across a few generations. So, in the coming years, should our predictions be based on dense systems of equations having data only from a couple of generations? This might happen in animals with short generation interval and high reproduction rate (poultry, pigs, etc.), but in dairy cattle, we may see that the pedigree will live a long life.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Genomic data and breeding value estimation in dairy cattle: theory, practice, problems.

Abstract

Talk to us

Similar Papers

More From: Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie

Lead the way for us

Journal: Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie	Publication Date: May 12, 2016
Citations: 3