Abstract
BackgroundHigh dimensional case control studies are ubiquitous in the biological sciences, particularly genomics. To maximise power while constraining cost and to minimise type-1 error rates, researchers typically seek to replicate findings in a second experiment on independent cohorts before proceeding with further analyses. This can be an expensive procedure, particularly when control samples are difficult to recruit or ascertain; for example in inter-disease comparisons, or studies on degenerative diseases.ResultsThis paper presents a method in which control (or case) samples from the discovery cohort are re-used in a replication study. The theoretical implications of this method are discussed and simulated genome-wide association study (GWAS) tests are used to compare performance against the standard approach in a range of circumstances.Using similar methods, a procedure is proposed for ‘partial replication’ using a new independent cohort consisting of only controls. This methods can be used to provide some validation of findings when a full replication procedure is not possible.The new method has differing sensitivity to confounding in study cohorts compared to the standard procedure, which must be considered in its application. Type-1 error rates in these scenarios are analytically and empirically derived, and an online tool for comparing power and error rates is provided.ConclusionsIn several common study designs, a shared-control method allows a substantial improvement in power while retaining type-1 error rate control. Although careful consideration must be made of all necessary assumptions, this method can enable more efficient use of data in GWAS and other applications.
Highlights
High dimensional case control studies are ubiquitous in the biological sciences, genomics
We assume a genome-wide association study (GWAS) dataset of a set of cases C1 and controls C0 used in a ‘discovery’ phase of a GWAS or similar study, and corresponding sets of cases and controls C1, C0 in the replication phase
We assume that C0 and C1 are genotyped at a set of Single-nucleotide polymorphism (SNP) S and C0, C1 at a set S ⊆ S
Summary
High dimensional case control studies are ubiquitous in the biological sciences, genomics. To maximise power while constraining cost and to minimise type-1 error rates, researchers typically seek to replicate findings in a second experiment on independent cohorts before proceeding with further analyses. This can be an expensive procedure, when control samples are difficult to recruit or ascertain; for example in inter-disease comparisons, or studies on degenerative diseases. Ing measurement of all variables in all samples It Results from original and replication datasets for which serves to protect against false-positives due to systematic some or all controls are shared cannot be directly comerrors in the original datasets, by re-testing association in pared due to the correlation between test statistics a second nominally independent dataset
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.