Performance of penalized maximum likelihood in estimation of genetic covariances matrices

Karin Meyer

doi:10.1186/1297-9686-43-39

Abstract

BackgroundEstimation of genetic covariance matrices for multivariate problems comprising more than a few traits is inherently problematic, since sampling variation increases dramatically with the number of traits. This paper investigates the efficacy of regularized estimation of covariance components in a maximum likelihood framework, imposing a penalty on the likelihood designed to reduce sampling variation. In particular, penalties that "borrow strength" from the phenotypic covariance matrix are considered.MethodsAn extensive simulation study was carried out to investigate the reduction in average 'loss', i.e. the deviation in estimated matrices from the population values, and the accompanying bias for a range of parameter values and sample sizes. A number of penalties are examined, penalizing either the canonical eigenvalues or the genetic covariance or correlation matrices. In addition, several strategies to determine the amount of penalization to be applied, i.e. to estimate the appropriate tuning factor, are explored.ResultsIt is shown that substantial reductions in loss for estimates of genetic covariance can be achieved for small to moderate sample sizes. While no penalty performed best overall, penalizing the variance among the estimated canonical eigenvalues on the logarithmic scale or shrinking the genetic towards the phenotypic correlation matrix appeared most advantageous. Estimating the tuning factor using cross-validation resulted in a loss reduction 10 to 15% less than that obtained if population values were known. Applying a mild penalty, chosen so that the deviation in likelihood from the maximum was non-significant, performed as well if not better than cross-validation and can be recommended as a pragmatic strategy.ConclusionsPenalized maximum likelihood estimation provides the means to 'make the most' of limited and precious data and facilitates more stable estimation for multi-dimensional analyses. It should become part of our everyday toolkit for multivariate estimation in quantitative genetics.

Highlights

Estimation of genetic parameters, i.e. the partitioning of phenotypic variation intovariances due to genetic effects and other sources, is one of the basic tasks in quantitative genetics
While maximum likelihood (ML) based methods of estimation make efficient use of all the data and readily allow estimates of covariance matrices to be constrained to the parameter space [5], the problems of sampling variation remain
Mean percentage reduction in average loss (PRIAL) values across all cases for individual covariance matrices and all penalties considered are summarized in Table 1 for a sample size of s = 100

Summary

Introduction

Estimation of genetic parameters, i.e. the partitioning of phenotypic variation into (co)variances due to genetic effects and other sources, is one of the basic tasks in quantitative genetics. Livestock improvement schemes consider a multitude of traits This requires complex, multivariate analyses that consider more than just a few traits simultaneously. A large proportion of the sampling variances of estimates of individual covariances can be attributed to this excess dispersion [3]. This is the more pronounced the larger the matrix, the smaller the data set and the more similar the population eigenvalues are. Estimation of genetic covariance matrices for multivariate problems comprising more than a few traits is inherently problematic, since sampling variation increases dramatically with the number of traits. Penalties that “borrow strength” from the phenotypic covariance matrix are considered

Methods

Results

Discussion

Conclusion