Strategies to improve the performance of rare variant association studies by optimizing the selection of controls.

Na Zhu,Peter M Krawitz,Peter N Robinson,Stefan Mundlos,Jochen Hecht,Tom Kamphans,Thorsten Dickhaus,Verena Heinrich

doi:10.1093/bioinformatics/btv457

Na Zhu, Peter M Krawitz + Show 6 more

Open Access

PDF Available

https://doi.org/10.1093/bioinformatics/btv457

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

When analyzing a case group of patients with ultra-rare disorders the ethnicities are often diverse and the data quality might vary. The population substructure in the case group as well as the heterogeneous data quality can cause substantial inflation of test statistics and result in spurious associations in case-control studies if not properly adjusted for. Existing techniques to correct for confounding effects were especially developed for common variants and are not applicable to rare variants. We analyzed strategies to select suitable controls for cases that are based on similarity metrics that vary in their weighting schemes. We simulated different disease entities on real exome data and show that a similarity-based selection scheme can help to reduce false positive associations and to optimize the performance of the statistical tests. Especially when data quality as well as ethnicities vary a lot in the case group, a matching approach that puts more weight on rare variants shows the best performance. We reanalyzed collections of unrelated patients with Kabuki make-up syndrome, Hyperphosphatasia with Mental Retardation syndrome and Catel-Manzke syndrome for which the disease genes were recently described. We show that rare variant association tests are more sensitive and specific in identifying the disease gene than intersection filters and should thus be considered as a favorable approach in analyzing even small patient cohorts. Datasets used in our analysis are available at ftp://ftp.1000genomes.ebi.ac.uk./vol1/ftp/ : peter.krawitz@charite.de Supplementary data are available at Bioinformatics online.

Full Text