Abstract
Private and public breeding programs, as well as companies and universities, have developed different genomics technologies that have resulted in the generation of unprecedented amounts of sequence data, which bring new challenges in terms of data management, query, and analysis. The magnitude and complexity of these datasets bring new challenges but also an opportunity to use the data available as a whole. Detailed phenotype data, combined with increasing amounts of genomic data, have an enormous potential to accelerate the identification of key traits to improve our understanding of quantitative genetics. Data harmonization enables cross-national and international comparative research, facilitating the extraction of new scientific knowledge. In this paper, we address the complex issue of combining high dimensional and unbalanced omics data. More specifically, we propose a covariance-based method for combining partial datasets in the genotype to phenotype spectrum. This method can be used to combine partially overlapping relationship/covariance matrices. Here, we show with applications that our approach might be advantageous to feature imputation based approaches; we demonstrate how this method can be used in genomic prediction using heterogeneous marker data and also how to combine the data from multiple phenotypic experiments to make inferences about previously unobserved trait relationships. Our results demonstrate that it is possible to harmonize datasets to improve available information across gene-banks, data repositories, or other data resources.
Highlights
The rapid scientific progress in these genomic approaches is due to the decrease in genotyping costs by the development of next-generation sequencing platforms since 2007 (Mardis, 2008a; Mardis, 2008b)
Let Ga1, Ga2, :::, Gam be the relationship matrices for genotypes in sets a1, a2,...,am We want to estimate the overall relationship matrix S for the n genotypes using Ga1, Ga2, :::, Gam : if we focus on one single relationship matrix Gai we drop the subscript and write Ga
Anchoring Independent PedigreeBased Relationship Matrices Using a Genotypic Relation Matrix In this application, we demonstrate that genomic relationship matrices can be used to connect several pedigree-based relationship matrices by the Wishart-EM-Algorithm
Summary
The rapid scientific progress in these genomic approaches is due to the decrease in genotyping costs by the development of next-generation sequencing platforms since 2007 (Mardis, 2008a; Mardis, 2008b). I.e. Combining Partially Overlapping Relationship Data predicting an organism’s phenotype using genetic information (Meuwissen et al, 2001), is currently used by many breeding companies because it improves three out of the four factors affecting the breeder equation (Hill and Mackay, 2004). Combining Partially Overlapping Relationship Data predicting an organism’s phenotype using genetic information (Meuwissen et al, 2001), is currently used by many breeding companies because it improves three out of the four factors affecting the breeder equation (Hill and Mackay, 2004) It reduces generation number, improves accuracy of selection, and increases selection intensity for a fixed budget when comparing with marker-assisted selection or phenotypic selection (Heffner et al, 2010; Heffner et al, 2011; de los Campos et al, 2013; Desta and Ortiz, 2014; Juliana et al, 2018). Genome-wide associating mapping studies, which originated in human genetics (Bodmer, 1986; Risch and Merikangas, 1996; Visscher et al, 2017), have become a routine in plant breeding (Gondro et al, 2013)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have