Abstract

BackgroundHigh dimensional case control studies are ubiquitous in the biological sciences, particularly genomics. To maximise power while constraining cost and to minimise type-1 error rates, researchers typically seek to replicate findings in a second experiment on independent cohorts before proceeding with further analyses. This can be an expensive procedure, particularly when control samples are difficult to recruit or ascertain; for example in inter-disease comparisons, or studies on degenerative diseases.ResultsThis paper presents a method in which control (or case) samples from the discovery cohort are re-used in a replication study. The theoretical implications of this method are discussed and simulated genome-wide association study (GWAS) tests are used to compare performance against the standard approach in a range of circumstances.Using similar methods, a procedure is proposed for ‘partial replication’ using a new independent cohort consisting of only controls. This methods can be used to provide some validation of findings when a full replication procedure is not possible.The new method has differing sensitivity to confounding in study cohorts compared to the standard procedure, which must be considered in its application. Type-1 error rates in these scenarios are analytically and empirically derived, and an online tool for comparing power and error rates is provided.ConclusionsIn several common study designs, a shared-control method allows a substantial improvement in power while retaining type-1 error rate control. Although careful consideration must be made of all necessary assumptions, this method can enable more efficient use of data in GWAS and other applications.

Highlights

  • High dimensional case control studies are ubiquitous in the biological sciences, genomics

  • We assume a genome-wide association study (GWAS) dataset of a set of cases C1 and controls C0 used in a ‘discovery’ phase of a GWAS or similar study, and corresponding sets of cases and controls C1, C0 in the replication phase

  • We assume that C0 and C1 are genotyped at a set of Single-nucleotide polymorphism (SNP) S and C0, C1 at a set S ⊆ S

Read more

Summary

Introduction

High dimensional case control studies are ubiquitous in the biological sciences, genomics. To maximise power while constraining cost and to minimise type-1 error rates, researchers typically seek to replicate findings in a second experiment on independent cohorts before proceeding with further analyses. This can be an expensive procedure, when control samples are difficult to recruit or ascertain; for example in inter-disease comparisons, or studies on degenerative diseases. Ing measurement of all variables in all samples It Results from original and replication datasets for which serves to protect against false-positives due to systematic some or all controls are shared cannot be directly comerrors in the original datasets, by re-testing association in pared due to the correlation between test statistics a second nominally independent dataset

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.