Abstract

AbstractBackgroundSmall insertions and deletions (indels) in the human genome are substantial contributors to genetic variation and impact human traits and diseases. The role of indels in late onset Alzheimer’s disease has been understudied. Few examples, such as the intronic poly‐T variant in the TOMM40 gene, suggest that systematic exploration of indels within LOAD risk regions will advance the understanding of the genetic architecture of LOAD. Previously we developed a bioinformatics pipeline that characterizes and prioritizes candidate regulatory SNPs in enhancers located in LOAD‐GWAS regions. Here we extend the pipeline to the analysis of indels. The proposed bioinformatics pipeline progresses from indels located in LOAD‐GWAS regions to a filtered set of regulatory variants that have a predicted strong effect on transcription factor (TFs) binding.MethodThe pipeline utilized publicly available functional genomics data sources. Primarily, candidate cis‐regulatory elements (cCREs) from ENCODE and single‐cell RNA‐seq data from LOAD patient samples (synapse: syn22079621). For TF binding analysis we employed motifs from MotifDb. In addition, we used various bioinformatics software including motifbreakR.ResultWe catalogued 1230 proximal CTCF‐bound candidate cis‐regulatory elements in LOAD‐GWAS regions, 912 showed epigenetic evidence in relevant brain tissue. We catalogued 426 indels in these cCREs. These indels disrupted 391 TFs, 362 of these had snRNA‐seq data from LOAD samples. Of note, TF motifs within the APOE‐TOMM40, SPI1 and MS4A2 regions were significantly disrupted by the candidate regulatory indels. Amongst these TFs are RUNX3, SPI1 and SMAD3. Interestingly, these significant findings with the APOE‐TOMM40, SPI1 and MS4A2 regions are consistent with our prior results for SNPs.ConclusionThis study provides an analytical framework to catalogue noncoding indel variation in cis‐regulatory elements located in LOAD‐GWAS loci and characterize their likelihood to perturb TF binding. The approach integrates multiple data types to prioritize genes and variants for validation experiments using disease models and gene editing technologies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call