In order to derive suitable breeding strategies for complex traits, it is indispensable that their genetic architecture as well as population specific parameters are thoroughly investigated. Large samples of genotyped individuals are required in order to produce reliable results and to draw adequate inferences. Whereas in cattle, pig and chicken such data are available for many populations, such datasets are rare for sheep. Except for some sheep breeding nations where genomic selection is applied to sheep, the majority of datasets of genotyped sheep come from experiments with often small sample sizes. Pooling data from different breeds and crosses is one way to increase the sample size for genomic analyses but is somehow challenging due to the multi-breed composition of the datasets. Population structure and genetic diversity has an impact on the results of genomic analyses but is often not studied in the populations in which genomic analyses are conducted. Genetically diverse populations as found in sheep often compromise the analyses and the interpretation of results. Here, we investigated the genetic diversity and population structure of purebred Merino and several Merino crossbred lambs as well as in their pooled data and conducted separate and joint genetic analyses. Linkage disequilibrium was studied in the pooled data as well as in the single datasets. Consistency of phase was calculated among crosses. Population structure assessed by principal component analysis. Variance components were estimated and genome-wide association studies were performed for three carcass traits. All investigations based on 50k genotype data. Large genetic diversity was found between and also within crosses. The populations showed low levels of linkage disequilibrium (average r² between 0.155 and 0.171 within crosses and 0.138 in the pooled dataset for inter-marker distances up to 0.059 Mbp) which decayed fast with increasing inter-marker distance which was particularly pronounced when the data were pooled. The population genetic structure of the purebred Merino sample was similar as in Merino crosses and the genetic differentiation between the different populations were small. Variance components could solely be estimated at acceptable standard errors for some of the single crosses and hence genome-wide association studies could not be performed for all crosses. Pooling data drastically reduced standard errors in variance component estimation but did only marginally improve gene mapping results in terms of power but not precision. The heritabilities estimated in the pooled dataset were 0.30, 0.18 and 0.18 for back length, shoulder width and leg width. Only one genome-wide significant SNP for shoulder width was detected. Further, across all traits and datasets, 25 SNPs were found to be putatively trait associated. Larger datasets and denser marker panels are required to utilize such datasets for gene mapping analyses at high power and precision and would improve population genetic investigations. If possible, and to avoid the application of too complex models, data pooling should rather be done by combining several populations of the same breed instead of multiple crosses.
Read full abstract