Abstract

The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits.The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.

Highlights

  • The new generation of sequencing platforms has dramatically altered the genetics field by allowing a costeffective approach to completely explore the information stored in a genome of interest

  • The number of genetic variants identified in such projects is astronomical, and the required multiple hypothesis threshold correction reduces the statistical power

  • We used the whole genome sequencing (WGS) data provided by the Genetic Analysis Workshop 19 (GAW19) organization to test 2 alternative approaches to improve the statistical power of a WGS association study and favor the detection of reliable candidate genes

Read more

Summary

Introduction

The new generation of sequencing platforms has dramatically altered the genetics field by allowing a costeffective approach to completely explore the information stored in a genome of interest. Large genetic and epidemiologic projects are beginning to utilize whole genome sequencing (WGS) to identify genetic variants (especially novel rare variants) that could explain the missing heritability paradox [1]. The paradox had been formulated during the genome wide association (GWA) era; those platforms are based on the CDCV (common disease common variant) premise. GWA studies have shown modest success for the identification of causal genetic variants associated with common diseases with high social impact, such as diabetes, hypertension, and cancer. The WGS approach tries to fill this large information gap by capturing all available genetic variations, including the novel, rare, and private variants that only appear in a specific family. Despite the substantial promise and potential, many statistical and analytical challenges still need to be surpassed for the efficient analysis of WGS data for the identification of causal genes

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call