Abstract
AbstractBackgroundThe Alzheimer’s Disease Sequencing Project (ADSP) has released the R3 dataset of whole‐genome sequencing (WGS) from 16,905 samples collected by 17 study cohorts across three major populations including 10,651 Non‐Hispanic Whites, 3,212 Hispanics, 2,874 African Americans, and 168 others. The release also includes a callset of 206 million biallelic single nucleotide variants (SNVs), 16 million biallelic insertion/deletions (INDELs), and 28 million multiallelic variants. Leveraging this dataset, we are conducting association analysis for variants, genes, and gene sets to identify genetic risk for Alzheimer’s Disease (AD).MethodWe assembled the sample list, kept only genetically unique individuals, and removed samples with unclear AD diagnoses. Samples were further removed if they had >10 Standard Deviation (SD) on any ADSP provided quality metrics including genotype missing rate, singleton rate, heterozygous/homozygous ratio, and transition/transversion ratio. Multiallelic variants were split into biallelic variants and, after QC, merged with the biallelic callset. We filtered variants if they did not obtain the GATK “pass”, were monomporphic, has an allele balance for heterozygous (ABHet) were <0.25 or >0.75, or had a genotype missing rate >0.05. RUTH was used to perform a robust unified Hardy‐Weinberg equilibrium test. We applied GENESIS PC‐AiR to conduct principal component (PC) analysis. We are performing association analyses with two approaches: using all samples while adjusting for genetic ancestry, and within population‐specific groups defined by PC clustering (10 SD) around reference populations from the Human Genome Diversity Project. Our analytical model will adjust for sex, PCs, APOE, and technical covariates.ResultThe merged biallelic and multiallelic callset contained 223 million variants. The minor allele frequency distribution (MAF) is 88.2% MAF<0.001, 6.5% 0.001<=MAF<0.01, 2.3% 0.01<=MAF<0.05, and 3.0% MAF>=0.05. PCs within the ADSP data have been generated. We are initiating association analyses including single variant analysis (using GENESIS), gene based analyses (using SKAT‐O) and rare noncoding variant set based testing (using STAAR).ConclusionWith this project, we will examine known and identify novel AD‐related variants. The dataset consists of participants from diverse genetic ancestry that will enable us to investigate how AD‐associated genetic risk factors differ by populations.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have