Abstract
Introduction & Objective: Rare variants with allele frequency smaller than 1% are postulated to be associated with disease susceptibility. Since allele frequencies vary globally, the use of population control data that does not match the study population can produce bias. The research question is to identify factors that explain variation in allele frequency across populations. The secondary question is to evaluate the potential bias in using population as control data when studying variants. We use data from gnomAD (Genome Aggregation Database) to answer these questions.
 Methods: We apply each of three model formulations: Linear, Logistic, and Poisson to explain how the frequency or count of variants depends on population subgroup/ancestry, functional annotation, sex, and disease status. We also evaluate interactions between population subgroups and functional annotation.
 Results: For very rare variants (allele frequency < 0.1%), likelihood ratio testing (LRT) provides evidence that allele frequencies vary with functional annotation and population in all three model formulations. By LRT, interactions of population and functional annotation are significant in the Logistic model and the Poisson model. The goodness-of-fit statistics show a better fit in the linear model compared to low frequency variants.
 Conclusion: We observe that population & functional annotation affect variant frequencies, and conclude that detection of differences across populations and annotations is model scale-dependent, especially for different degrees of rareness. Therefore, statisticians need to carefully consider the potential for bias when using gnomAD as control data. Moreover, gnomAD is a great resource for studies dealing with rare variants.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.