Abstract
BackgroundGenomic prediction is an advanced method for estimating genetic values, which has been widely accepted for genetic evaluation in animal and disease-risk prediction in human. It estimates genetic values with genome-wide distributed SNPs instead of pedigree. The key step of it is to construct genomic relationship matrix (GRM) via genome-wide SNPs; however, usually the calculation of GRM needs huge computer memory especially when the SNP number and sample size are big, so that sometimes it will become computationally prohibitive even for super computer clusters. We herein developed an integrative algorithm to compute GRM. To avoid calculating GRM for the whole genome, ICGRM freely divides the genome-wide SNPs into several segments and computes the summary statistics related to GRM for each segment that requires quite few computer RAM; then it integrates these summary statistics to produce GRM for whole genome.ResultsIt showed that the computer memory of ICGRM was reduced by 15 times (from 218Gb to 14Gb) after the genome SNPs were split into 5 to 200 parts in terms of the number of SNPs in our simulation dataset, making it computationally feasible for almost all kinds of computer servers. ICGRM is implemented in C/C++ and freely available via https://github.com/mingfang618/CLGRM.ConclusionsICGRM is computationally efficient software to build GRM and can be used for big dataset.
Highlights
Genomic prediction is an advanced method for estimating genetic values, which has been widely accepted for genetic evaluation in animal and disease-risk prediction in human
The idea of the proposed method is that it firstly splits the genome SNPs into d segments, for each segment, it calculates the summary statistics related to each genomic relationship matrix (GRM); it combines these summary statistics to produce the GRM
⋯, calzcfomurljzas0nteej gamtnheednttNwsso1⁄4wsittPahtikjsk1⁄4stskics−Ss1NpfjPoðsr1,−ewpaejcÞh,carselecsgupmleacettenivt.eDlSys p,1⁄4efocPirficskj1⁄4sa=lkls1y−1, 2, ⋯, d and k0 = 1. After calculating these summary statistics, we save them on computer disk, and use them to calculate GRM for whole genome using the eq (3)
Summary
Genomic prediction is an advanced method for estimating genetic values, which has been widely accepted for genetic evaluation in animal and disease-risk prediction in human. It estimates genetic values with genome-wide distributed SNPs instead of pedigree. To avoid calculating GRM for the whole genome, ICGRM freely divides the genome-wide SNPs into several segments and computes the summary statistics related to GRM for each segment that requires quite few computer RAM; it integrates these summary statistics to produce GRM for whole genome. With the development of sequencing technique, using whole genome-wide SNPs to calculate the similarities among individuals has been well developed [2], in which the pairwise kinship among individuals is usually described with a matrix called genomic relationship. It is very meaningful to develop new software to solve this problem
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.