Multi-population genomic prediction using a multi-task Bayesian learning model.

Liuhong Chen,Changxi Li,Stephen Miller,Flavio Schenkel

doi:10.1186/1471-2156-15-53

Abstract

BackgroundGenomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a multi-task Bayesian learning model for multi-population genomic prediction with a strategy to effectively share information across populations. Simulation studies and real data from Holstein and Ayrshire dairy breeds with phenotypes on five milk production traits were used to evaluate the proposed multi-task Bayesian learning model and compare with a single-task model and a simple data pooling method.ResultsA multi-task Bayesian learning model was proposed for multi-population genomic prediction. Information was shared across populations through a common set of latent indicator variables while SNP effects were allowed to vary in different populations. Both simulation studies and real data analysis showed the effectiveness of the multi-task model in improving genomic prediction accuracy for the smaller Ayshire breed. Simulation studies suggested that the multi-task model was most effective when the number of QTL was small (n = 20), with an increase of accuracy by up to 0.09 when QTL effects were lowly correlated between two populations (ρ = 0.2), and up to 0.16 when QTL effects were highly correlated (ρ = 0.8). When QTL genotypes were included for training and validation, the improvements were 0.16 and 0.22, respectively, for scenarios of the low and high correlation of QTL effects between two populations. When the number of QTL was large (n = 200), improvement was small with a maximum of 0.02 when QTL genotypes were not included for genomic prediction. Reduction in accuracy was observed for the simple pooling method when the number of QTL was small and correlation of QTL effects between the two populations was low. For the real data, the multi-task model achieved an increase of accuracy between 0 and 0.07 in the Ayrshire validation set when 28,206 SNPs were used, while the simple data pooling method resulted in a reduction of accuracy for all traits except for protein percentage. When 246,668 SNPs were used, the accuracy achieved from the multi-task model increased by 0 to 0.03, while using the pooling method resulted in a reduction of accuracy by 0.01 to 0.09. In the Holstein population, the three methods had similar performance.ConclusionsResults in this study suggest that the proposed multi-task Bayesian learning model for multi-population genomic prediction is effective and has the potential to improve the accuracy of genomic prediction.

Highlights

Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations
In the Ayrshire validation set, multitask Bayesian learning model performed the best among the three methods within each single nucleotide polymorphism (SNP) panel used under the scenario with either a low (ρ = 0.2) or high (ρ = 0.8) correlation of simulated quantitative trait loci (QTL) effects between Ayrshire and Holstein populations
The pooling method produced substantially lower accuracy than the multi-task model in the Ayrshire validation set, especially when QTL effects were lower correlated between the two populations

Summary

Introduction

Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. In Holstein dairy cattle, genomic prediction has been successfully applied using the Illumina BovineSNP50 single nucleotide polymorphism (SNP) panel [4,5] For smaller populations such as Ayrshire in dairy cattle, acquisition of a large number of animals to be included in the training data set for genomic prediction still remains a challenge. Brondum et al [10] proposed an approach called BayesRS for multi-population genomic prediction, where a location specific genetic variance derived in one population were used as priors for another population They found that for some traits, BayesRS might be advantageous compared to the approach of pooling training data sets for distantly-related populations; but for closely related populations the method did not perform better than pooling data together

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC genetics	Publication Date: Jan 1, 2014
Citations: 23	License type: cc-by

R Discovery Prime

Multi-population genomic prediction using a multi-task Bayesian learning model.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC genetics

Lead the way for us

Similar Papers

Abstract 279: Multi-task Learning Improves Model Performance in Predicting Rare Catastrophic Events in Healthcare Claims Dataset
Chienyu Chi ... Yen-Pin Chen
Circulation | VOL. 142
Chienyu Chi, et. al.Chienyu Chi ... Yen-Pin Chen
17 Nov 2020
Circulation | VOL. 142

A transformer-based multi-task deep learning model for simultaneous T-stage identification and segmentation of nasopharyngeal carcinoma.
Kaifan Yang ... Shujun Liang
Frontiers in oncology | VOL. 14
Kaifan Yang, et. al.Kaifan Yang ... Shujun Liang
01 Jan 2024
Frontiers in oncology | VOL. 14

An Approach to Investigate the Effectiveness of Multi-Task Learning
Manar Safan ... Aya Aboudina
-
Manar Safan, et. al.Manar Safan ... Aya Aboudina
29 Dec 2020
29 Dec 2020

A comprehensive multi-task deep learning approach for predicting metabolic syndrome with genetic, nutritional, and clinical data
Minhyuk Lee ... Mira Park
Scientific Reports | VOL. 14
Minhyuk Lee, et. al.Minhyuk Lee ... Mira Park
01 Aug 2024
Scientific Reports | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Multi-population genomic prediction using a multi-task Bayesian learning model.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC genetics