Abstract

BackgroundEpigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results.ResultsWe show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks.ConclusionsAnalyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.

Highlights

  • Epigenomic studies that use generation sequencing experiments typically rely on the alignment of reads to a reference sequence

  • We wanted to estimate the proportion of changed mappings and noted that 3.6% of whole-genome sequencing (WGS) reads move depending on the reference that is used (Additional file 1: Table S1a)

  • Personal-only peaks emerge when reads shift their mapping from the reference pileup to the new personalized pileup or when reads that did not map to the reference become mapped to the personalized genome

Read more

Summary

Introduction

Epigenomic studies that use generation sequencing experiments typically rely on the alignment of reads to a reference sequence. Because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. Standard ChIP-seq analysis relies on aligning reads to a reference sequence followed by peak calling [1, 2]. Differences between the genome under study and the reference will shift the mapping of some reads and generate unmapped reads (Fig. 1a), a phenomenon known as reference bias [5]. It has already been shown that just changing the assembly version of the reference can affect epigenomic analyses [6]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call