Abstract

Pan-genomic open reading frames (ORFs) potentially carry protein-coding gene or coding variant information in a population. In this study, we suggest that pan-genomic ORFs are promising to be utilized in estimation of heritability and genomic prediction. A Saccharomyces cerevisiae dataset with whole-genome SNPs, pan-genomic ORFs, and the copy numbers of those ORFs is used to test the effectiveness of ORF data as a predictor in three prediction models for 35 traits. Our results show that the ORF-based heritability can capture more genetic effects than SNP-based heritability for all traits. Compared to SNP-based genomic prediction (GBLUP), pan-genomic ORF-based genomic prediction (OBLUP) is distinctly more accurate for all traits, and the predictive abilities on average are more than doubled across all traits. For four traits, the copy number of ORF-based prediction(CBLUP) is more accurate than OBLUP. When using different numbers of isolates in training sets in ORF-based prediction, the predictive abilities for all traits increased as more isolates are added in the training sets, suggesting that with very large training sets the prediction accuracy will be in the range of the square root of the heritability. We conclude that pan-genomic ORFs have the potential to be a supplement of single nucleotide polymorphisms in estimation of heritability and genomic prediction.

Highlights

  • Genome-wide single nucleotide polymorphisms (SNPs) were first proposed in 2001 to be used for predicting genetic values [1]

  • The properties of single nucleotide polymorphisms (SNPs) as a main source of genetic variability for estimation of heritability and genomic prediction have been widely studied over the past years

  • Three types of datasets: all common SNPs, pan-genomic open reading frames, and copy numbers of pan-genomic open reading frames were used for principal components analysis (PCA) on the 787 diploid S. cerevisiae isolates [33]

Read more

Summary

Introduction

Genome-wide single nucleotide polymorphisms (SNPs) were first proposed in 2001 to be used for predicting genetic values [1]. By utilizing genome-wide SNP data, ‘genomic selection’ based on genomically predicted breeding values has triggered a revolution in animal and plant breeding. It improved the genetic progress by reducing generation intervals or increasing predictive ability of breeding values [3,4,5]. In genomic prediction the causal variant effects are estimated indirectly by modeling SNP makers that are in linkage disequilibrium (LD) with them [2]. The prediction accuracy highly depends on the level of LD between SNP markers and causal variants, and the level of LD depends on the relatedness of the individuals used [7]. Several factors inevitably cause the ‘still missing heritability’ problem when using common SNPs exceeding a certain minor allele frequency (MAF) to estimate narrow sense heritability [10]: for instance, the causal variants may not be in complete LD with the SNPs that have been genotyped, or rare variants of large effect are not tagged by common SNPs on genotyping arrays [11, 12]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.