AbstractBackgroundUsing family health history as proxy phenotype has become a common approach in genome‐wide association studies (GWAS) of Alzheimer’s disease (AD). This approach leverages large samples in population biobanks where most mid‐aged participants do not have any AD diagnosis. However, methodological issues in GWAS‐by‐proxy (GWAX) and quality of its association results have not been carefully investigated.MethodWe performed GWAX on parental AD history in UK Biobank (N = 48,031 proxy cases and 315,286 controls). We then applied GWAS‐by‐subtraction to obtain the non‐AD component underlying AD GWAX. We computed genetic correlations of AD case‐control GWAS, AD GWAX, and the non‐AD component underlying GWAX with a suite of complex traits. We also explored the impact of various quality control procedures and additional covariates in AD GWAX analysis.ResultAD GWAS and GWAX showed drastically different genetic correlations with several other complex traits especially cognitive abilities (Figure 1). AD GWAX had a surprising genetic correlation with higher education (rg = 0.167, p = 1.7e‐11) as opposed to the negative AD‐education correlation observed for GWAS (rg = ‐0.133, p = 2.4e‐5). The positive genetic correlation between AD and cognition is explained by the non‐AD component in AD GWAX (rg = 0.260 with education, p = 5.2e‐11). The non‐AD component in GWAX also showed correlations with lower risks of health outcomes such as coronary artery disease (rg = ‐0.136, p = 2.8e‐3) and diabetes (rg = ‐0.206, p = 2.8e‐3), indicating substantial survival bias in samples reporting parental AD history. Adjusting for parental age in GWAX substantially reduced but did not fully remove the genetic correlation of AD and higher education (rg = 0.013, p = 0.67), hinting at additional unaccounted factors in the current GWAX practice. We also provide evidence that genetic correlation between family health history awareness and education (rg = 0.285, p = 1.4e‐10) leads to these biases in GWAX associations.ConclusionGWAX based on parental health history introduces substantial and systematic biases in AD genetic associations due to the non‐representativeness of the biobank health history survey. Naively combining GWAX with regular case‐control GWAS will lead to misleading results both at the single variant level and in genome‐wide analyses such as genetic correlation estimation and polygenic risk score applications.