Abstract

Haplotype prediction models open many possibilities to improve the accuracy of genomic selection but require more data processing and computing time than single-SNP prediction models. To facilitate haplotype analysis for genomic prediction and estimation using structural and functional genomic information, we developed a computing pipeline to implement haplotype analysis with capabilities for preparation of input data for haplotype analysis, genomic prediction and estimation using GVCHAP, and analysis of GVCHAP results. Data preparation includes utility programs for haplotype imputing; defining haplotype blocks by a fixed number of SNPs, a fixed distance in base pairs per block, or user defined block lengths based on structural or functional genomic information or a mixture of both types of information; and defining haplotype genotypes within each haplotype block. GVCHAP is the main program for genomic prediction and estimation, calculates GREML (genomic restricted maximum likelihood) estimates of variance components and heritabilities, and calculates GBLUP (genomic best linear unbiased prediction) for additive and dominance values of single SNPs as well as additive values of haplotypes with reliability estimates for training and validation populations. A two-step strategy and a method of multi-node processing are implemented to remove the computing bottleneck due to the creation of genomic relationship matrices for large samples. The analysis of GVCHAP results includes calculation of observed prediction accuracies from validation studies and preparation of input files for graphical visualization of heritability estimates of haplotype blocks as well as estimates of SNP effects and heritabilities. The entire pipeline provides an efficient and versatile computing tool for identifying the most accurate haplotype model among many candidate haplotype models utilizing structural and functional genomic information for genomic selection.

Highlights

  • Current methods using single nucleotide polymorphism (SNP) markers for genomic evaluation mostly use single-SNP prediction models

  • Five input files are required for running GVCHAP: SNP genotypes, haplotype genotypes, phenotypes, SNP map file, and parameter file

  • The SNP genotypic files, phenotypic file and map file are provided by the user and remain unchanged for GVCHAP analysis

Read more

Summary

Introduction

Current methods using single nucleotide polymorphism (SNP) markers for genomic evaluation mostly use single-SNP prediction models. Compared to studies using single-SNP prediction models, only limited studies were available for using haplotype models in genomic evaluation (Calus et al, 2008; Villumsen et al, 2009; Mulder et al, 2010; Boichard et al, 2012; Cuyabano et al, 2015; Hess et al, 2017; Jónás et al, 2017; Jiang et al, 2018; Jan et al, 2019) Those studies achieved mixed results from little to substantial improvement in prediction accuracy due to the use of haplotypes relative to single-SNP models, and used haplotype blocking methods that are only a fraction of many possible haplotype models. The computing pipeline in this article provides a full-featured computing tool for genomic prediction and variance component estimation using haplotypes with capability to minimize the data processing work, reduce computing difficulty, and conduct haplotype genomic prediction and estimation in an automated fashion

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call