Abstract

BackgroundGenetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging. Quantitative sequence-based experiments such as ChIP-seq and DNase-seq can detect sites of allelic imbalance where alleles contribute disproportionately to the overall signal suggesting allelic differences in regulatory activity.MethodsWe created an allelic imbalance detection pipeline, AA-ALIGNER, to remove reference mapping biases influencing allelic imbalance detection and evaluate accuracy of allelic imbalance predictions in the absence of complete genotype data. Using the sequence aligner, GSNAP, and varying amounts of genotype information to remove mapping biases we investigated the accuracy of allelic imbalance detection (binomial test) in CREB1 ChIP-seq reads from the GM12878 cell line. Additionally we thoroughly evaluated the influence of experimental and analytical parameters on imbalance detection.ResultsCompared to imbalances identified using complete genotypes, using imputed partial sample genotypes, AA-ALIGNER detected >95 % of imbalances with >90 % accuracy. AA-ALIGNER performed nearly as well using common variants when genotypes were unknown. In contrast, predicting additional heterozygous sites and imbalances using the sequence data led to >50 % false positive rates. We evaluated effects of experimental data characteristics and key analytical parameter settings on imbalance detection. Overall, total base coverage and signal dispersion across the genome most affected our ability to detect imbalances, while parameters such as imbalance significance, imputation quality thresholds, and alignment mismatches had little effect. To assess the biological relevance of imbalance predictions, we used electrophoretic mobility shift assays to functionally test for predicted allelic differences in CREB1 binding in the GM12878 lymphoblast cell line. Six of nine tested variants exhibited allelic differences in binding. Two of these variants, rs2382818 and rs713875, are located within inflammatory bowel disease-associated loci.ConclusionsAA-ALIGNER accurately detects allelic imbalance in quantitative sequence data using partial genotypes or common variants filling a critical methodological gap in these analyses, as full genotypes are rarely available. Importantly, we demonstrate how experimental and analytical features impact imbalance detection providing guidance for similar future studies.Electronic supplementary materialThe online version of this article (doi:10.1186/s12920-015-0117-x) contains supplementary material, which is available to authorized users.

Highlights

  • Genetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging

  • Using BWA alignments that did not include any variant information, we predicted heterozygous sites and allelic imbalances as above. If we separate these predictions into those sites that are and are not common variants, we find that the sensitivity and precision are significantly higher for common variants (Additional file 2: Table S1), Table 2 Allelic imbalance detection accuracy in alignments using partial or no genotypes compared to complete genotypes

  • We identified 238 heterozygous sites in GM12878 that are in linkage disequilibrium (1000 Genomes EUR; r2 ≥ .8) with one of 218 index SNPs reported for a genome wide association study (GWAS, P < 1.0x10−5) [39]

Read more

Summary

Introduction

Genetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging. Genetic studies of complex traits and diseases have been increasing their focus on the contribution of gene transcriptional regulation. Quantitative short-read sequence data generated from experiments such as ChIP-seq [3], DNase-seq [4], FAIREseq [5], and ATAC-seq [6] broadly identify genomic regions that regulate gene transcription. Sequence information from these experiments can be used to detect allele-specific activity in samples where heterozygous variants are present in or near a regulatory element. Previous studies have used quantitative short-read data to correlate genetic variation in regulatory regions with nearby gene expression [7, 8] and to show the heritability of allelic regulatory effects [8,9,10,11,12]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.