Abstract

It is of practical importance to ensure the data quality from a milk-recording system before use for genetic evaluation. A procedure was developed for detection of multivariate outliers based on an approximation for Mahalanobis distance and was implemented in the Nordic Holstein and Red population. The general target of this procedure is based on the Nordic Cattle Genetic Evaluation yield model, which is a 9-trait model for milk, protein, and fat in the first 3 lactations. The procedure is based on the phenotypic correlation structure as a function of days in milk (DIM) and on computation of trait means and standard deviations within a production year, lactation, and DIM. For each record in the data, a Mahalanobis distance value was computed based on the trait mean and the covariance matrix for the actual production year, lactation, and DIM. A set of cutoff values, ranging from 10 to 100 with steps of 10, for discarding multivariate outliers was investigated. Prediction accuracy was calculated as the Pearson correlations between estimated breeding values predicted by full data set and estimated breeding values predicted by reduced data set for cows without records in the reduced data set and with 1 or more records deleted due to the editing rules on Mahalanobis distance. The results showed that, averaged over all scenarios, gains of 0.005 to 0.048 on prediction accuracy have been obtained by deleting the multivariate outliers. The improvements were more profound for progeny of young bulls compared with progeny of proven bulls. It is easy to implement this multivariate outlier-detection procedure in the routine genetic evaluation for different dairy cattle breeds; however, an optimal cutoff value for Mahalanobis distance needs to be defined to achieve an acceptable compromise between genetic evaluation accuracy and data deletion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call