Abstract
It is extremely expensive to conduct large sample size array- or sequencing based genome scale association studies. For a quantitative trait, an extreme case-control study design may improve the power and reduce the cost of variant calling. We investigated the performance of extreme study design when various proportions of samples are selected from the tails of phenotype distribution. Using simulations, we show that when risk genotypes become rare in the population and effect size is relatively small, it is beneficial to carry out an extreme sampling study. In particular, the number of selected cases and controls can even be unbalanced such that power is further increased, compared with a balanced selection. Our application to two data sets: methadone dose data and yearling weight data, demonstrated that similar results for full data analysis can be obtained using extreme sampling with only a fraction of the data. Using power analysis with simulated data and an experimental data application, we conclude that when full data is unavailable due to restricted budget, it is rewarding to employ an extreme sampling design in the sense that there can be immense cost reductions and qualitatively similar power as in the full data analysis.
Highlights
It is extremely expensive to conduct large sample size array- or sequencing based genome scale association studies
For a quantitative trait, it has been proposed that one cost-effective strategy for enriching the presence, or absence, of a causal allele in a sample and reducing the cost of variant calling is to only take extreme observations of the trait distribution and carry out a case-control study instead of a regular QTL mapping study
We investigate in a straightforward way the power of using extreme phenotype samples defined by different thresholds, for which unbalanced selections of cases and controls are rarely discussed previously
Summary
It is extremely expensive to conduct large sample size array- or sequencing based genome scale association studies. An extreme case-control study design may improve the power and reduce the cost of variant calling. Considerable attention in recent years has turned to extremely large sample size array- or sequencing-based genome-scale association studies in the search for more undiscovered causal variants[3]. For a quantitative trait, it has been proposed that one cost-effective strategy for enriching the presence, or absence, of a causal allele in a sample and reducing the cost of variant calling is to only take extreme observations of the trait distribution and carry out a case-control study instead of a regular QTL mapping study. We investigate in a straightforward way the power of using extreme phenotype samples defined by different thresholds, for which unbalanced selections of cases and controls are rarely discussed previously. Our observations offer a practical guide for researchers to choose an appropriate threshold in defining cases and controls
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.