Abstract

BackgroundNext-generation sequencing (NGS) is widely used for genome-wide identification and quantification of DNA elements involved in the regulation of gene transcription. Studies that generate multiple high-throughput NGS datasets require data integration methods for two general tasks: 1) generation of genome-wide data tracks representing an aggregate of multiple replicates of the same experiment; and 2) combination of tracks from different experimental types that provide complementary information regarding the location of genomic features such as enhancers.ResultsNGS-Integrator is a Java-based command line application, facilitating efficient integration of multiple genome-wide NGS datasets. NGS-Integrator first transforms all input data tracks using the complement of the minimum Bayes’ factor so that all values are expressed in the range [0,1] representing the probability of a true signal given the background noise. Then, NGS-Integrator calculates the joint probability for every genomic position to create an integrated track. We provide examples using real NGS data generated in our laboratory and from the mouse ENCODE database.ConclusionsOur results show that NGS-Integrator is both time- and memory-efficient. Our examples show that NGS-Integrator can integrate information to facilitate downstream analyses that identify functional regulatory domains along the genome.

Highlights

  • Next-generation sequencing (NGS) is widely used for genome-wide identification and quantification of DNA elements involved in the regulation of gene transcription

  • The Calculator element calculates the complement of the minimum Bayes factor at each position i, estimating ni as the median of the ri values across a window of i values straddling the position at which pi is being calculated

  • Example 1:Integration of multiple-replicate transcription factor ChIP-Seq data To provide an example of an application of next-generation sequencing (NGS)-Integrator for genome-wide NGS data for multiple replicates of the same experiment, we processed previously unpublished datasets obtained from antibody-based chromatin immunoprecipitation followed by NGS (ChIP-Seq) using an antibody to the transcription factor ELF1 (Fig. 1b)

Read more

Summary

Introduction

Next-generation sequencing (NGS) is widely used for genome-wide identification and quantification of DNA elements involved in the regulation of gene transcription. Genome-wide next-generation sequencing (NGS) can provide information about binding of proteins to DNA, chromatin modifications, and chromatin accessibility. These factors are under intense study because they determine what genes are expressed in a given cell type and how gene expression is regulated in different physiological or pathophysiological states. Our proposed method, called NGS-Integrator, transforms genome-wide NGS data for identification of genomic regulatory elements and chromatin modifications (e.g. ChIP-Seq, ATAC-Seq, and Bisulfite-Seq) to a probability vector using the complement of the minimum Bayes’ factor for each track, followed by calculation of joint probabilities for the tracks as a function of position along the genome. The integrated tracks can be used as input to peak calling programs to identify specific features with greater specificity

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call