Abstract

BackgroundStarting from a forensic problem, Homer et al. showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool. The finding was extended to prove the possibility of detecting whether a subject participated in a small homogeneous GWAS. We denote this as the detection of a subject belonging to a certain cohort (SBCC). Subsequently, Visscher and Hill showed that the power to detect SBCC signal for an ethnically homogeneous cohort depends roughly on the ratio of the number of independent markers and total sample size. However, it is not clear if the same holds for more ethnically diverse cohorts. Later, Masca et al. propose running as SBCC test a regression of departure from assumed population frequency of i) subject genotype on ii) cohort of interest frequency. They use simulations to show that the approach has better SBCC detection power than the original Homer method but is impeded by population stratification.ApproachTo investigate the possibility of SBCC detection in multi-ethnic cohorts, we generalize the Masca et al. approach by theoretically deriving the correlation between a subject genotype and the cohort reference allele frequencies (RAFs) for stratified cohorts. Based on the derived formula, we theoretically show that, due to background stratification noise, SBCC detection is unlikely even for mildly stratified cohorts of size greater than around a thousand subjects. Thus, for the vast majority of contemporary cohorts, the fear of compromising privacy via SBCC detection is unfounded.

Highlights

  • Starting from a forensic problem, Homer et al showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool

  • Spurred by stricter NIMH requirement for sharing data, in the beginning of Genome Wide Association Studies (GWASs) era most researchers published in a timely manner summary statistics from studies, e.g. Z-scores, odds ratios (OR) and, even reference allele frequency

  • The authors extended the findings to show that you can detect if a subject participated in a small (N%1,500) homogeneous GWAS by using only summary statistics and reference allele frequencies (RAFs)

Read more

Summary

Background

Starting from a forensic problem, Homer et al showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool. Visscher and Hill showed that the power to detect SBCC signal for an ethnically homogeneous cohort depends roughly on the ratio of the number of independent markers and total sample size. It is not clear if the same holds for more ethnically diverse cohorts. Masca et al propose running as SBCC test a regression of departure from assumed population frequency of i) subject genotype on ii) cohort of interest frequency They use simulations to show that the approach has better SBCC detection power than the original Homer method but is impeded by population stratification.

Introduction
Methods
Results
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call