Abstract

BackgroundGenome-wide association studies (GWAS) are widely used to identify regions of the genome that harbor genetic determinants of quantitative traits. However, the multiple-testing burden from scanning tens of millions of whole-genome sequence variants reduces the power to identify associated variants, especially if sample size is limited. In addition, factors such as inaccuracy of imputation, complex linkage disequilibrium structures, and multiple closely-located causal variants may result in an identified causative mutation not being the most significant single nucleotide polymorphism in a particular genomic region. Therefore, the use of information from different sources, particularly variant annotations, was proposed to enhance the fine-mapping of causal variants. Here, we tested whether applying significance thresholds based on variant annotation categories increases the power of GWAS compared with a flat Bonferroni multiple-testing correction.ResultsWhole-genome sequence variants in dairy cattle were categorized according to type and predicted impact. Then, GWAS between markers and 17 quantitative traits were analyzed for enrichment for association of each annotation category. By using annotation categories that were determined with the variants effect predictor software and datasets indicating regions of open chromatin, “low impact” variants were found to be highly enriched. Moreover, when the variants annotated as “modifier” and not located at open chromatin regions were further classified into different types of potential regulatory elements, the high impact variants, moderate impact variants, variants located in the 3′ and 5′ untranslated regions, and variants located in potential non-coding RNA regions exhibited relatively more enrichment. In contrast, a similar study on human GWAS data reported that enrichment of association signals was highest with high impact variants. We observed an increase in power when these variant category-based significance thresholds were applied for GWAS results on stature in Nordic Holstein cattle, as more candidate genes from previous large GWAS meta-analysis for cattle stature were confirmed.ConclusionsUse of variant category-based genome-wide significance thresholds can marginally increase the power to detect the candidate genes in cattle. With the continued improvements in annotation of the bovine genome, we anticipate that the growing usefulness of variant category-based significance thresholds will be demonstrated.

Highlights

  • Genome-wide association studies (GWAS) are widely used to identify regions of the genome that harbor genetic determinants of quantitative traits

  • The phenotypic values used in the association analysis included deregressed proofs that were derived for animals based on the effective daughter contributions of sires and maternal grandsires [22, 23], which were obtained from the NAV routine genetic evaluations by using the MiX99 software [24]

  • We considered regulatory elements (RE) for two reasons: (1) because promoters and transcription factor binding sites are near transcription start sites [36, 37], and regions proximal to genes tend to exhibit greater enrichment of significantly associated variants in GWAS [38]; and (2) predicted RE can potentially help identify causal mutations [34]. non-coding RNAs (ncRNAs) play a major role in gene expression regulation [39], their specific functions are largely unknown [40]

Read more

Summary

Introduction

Genome-wide association studies (GWAS) are widely used to identify regions of the genome that harbor genetic determinants of quantitative traits. The multiple-testing burden from scanning tens of millions of whole-genome sequence variants reduces the power to identify associated variants, especially if sample size is limited. Factors such as inaccuracy of imputation, complex linkage disequilibrium structures, and multiple closely-located causal variants may result in an identified causative mutation not being the most significant single nucleotide polymorphism in a particular genomic region. Cai et al Genet Sel Evol (2019) 51:20 variants which are in perfect, or near-perfect, linkage disequilibrium (LD) with them To address this problem, additional information from independent sources are needed. Large-scale eQTL studies can be expensive because they require generation of RNAseq data specific to the population under study, especially in the case of livestock species for which initiatives such as the GTEx [15] project in humans do not exist

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call