Abstract

Different ChIP-seq peak callers often produce different output results from the same input. Since different peak callers are known to produce differentially enriched peaks with a large variance in peak length distribution and total peak count, accurately annotating peak lists with their nearest genes can be an arduous process. Functional genomic annotation of histone modification ChIP-seq data can be a particularly challenging task, as chromatin marks that have inherently broad peaks with a diffuse range of signal enrichment (e.g., H3K9me1, H3K27me3) differ significantly from narrow peaks that exhibit a compact and localized enrichment pattern (e.g., H3K4me3, H3K9ac). In addition, varying degrees of tissue-dependent broadness of an epigenetic mark can make it difficult to accurately and reliably link sequencing data to biological function. Thus, there exists an unmet need to develop a software program that can precisely tailor the computational analysis of a ChIP-seq dataset to the specific peak coordinates of the data and its surrounding genomic features. geneXtendeR optimizes the functional annotation of ChIP-seq peaks by exploring relative differences in annotating ChIP-seq peak sets to variable-length gene bodies. In contrast to prior techniques, geneXtendeR considers peak annotations beyond just the closest gene, allowing users to investigate peak summary statistics for the first-closest gene, second-closest gene, ..., nth-closest gene whilst ranking the output according to biologically relevant events and iteratively comparing the fidelity of peak-to-gene overlap across a user-defined range of upstream and downstream extensions on the original boundaries of each gene's coordinates. We tested geneXtendeR on 547 human transcription factor ChIP-seq ENCODE datasets and 198 human histone modification ChIP-seq ENCODE datasets, providing the analysis results as case studies. The geneXtendeR R/Bioconductor package (including detailed introductory vignettes) is available under the GPL-3 Open Source license and is freely available to download from Bioconductor at: https://bioconductor.org/packages/geneXtendeR/

Highlights

  • The field of epigenetic research studies the process by which heritable changes in gene expression occur without underlying alterations in the DNA sequence

  • Depending on the peak caller used, computational factors such as the variance observed in peak coordinate positions – both in terms of length distribution of peaks as well as the total number of peaks called – is an issue that persists even when samples are run at identical default parameter values[9,10]

  • The combined effect of these factors exerts a unique influence over the functional annotation and understanding of genomic variability, which complicates the study of epigenetic regulation of biological function

Read more

Summary

Introduction

The field of epigenetic research studies the process by which heritable changes in gene expression occur without underlying alterations in the DNA sequence. Chromatin marks come in a variety of different shapes and sizes, ranging from the extremely broad to the extremely narrow[2,3,4,5,6] This spectrum depends on a number of biological factors ranging from qualitative characteristics such as tissue-type[7] to temporal aspects such as developmental stage[8]. Depending on the peak caller used, computational factors such as the variance observed in peak coordinate positions (peak start, peak end) – both in terms of length distribution of peaks as well as the total number of peaks called – is an issue that persists even when samples are run at identical default parameter values[9,10] This variance becomes a factor when annotating peak lists genome-wide with their nearest genes as peaks can be shifted in genomic position (towards 5’ or 3’ end) or be of different lengths, depending on the peak caller employed. The combined effect of these factors exerts a unique influence over the functional annotation and understanding of genomic variability, which complicates the study of epigenetic regulation of biological function

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call