Abstract
Establishing the architecture of gene regulatory networks (GRNs) relies on chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) methods that provide genome-wide transcription factor binding sites (TFBSs). ChIP-Seq furnishes millions of short reads that, after alignment, describe the genome-wide binding sites of a particular TF. However, in all organisms investigated an average of 40% of reads fail to align to the corresponding genome, with some datasets having as much as 80% of reads failing to align. We describe here the provenance of previously unaligned reads in ChIP-Seq experiments from animals and plants. We show that a substantial portion corresponds to sequences of bacterial and metazoan origin, irrespective of the ChIP-Seq chromatin source. Unforeseen was the finding that 30%–40% of unaligned reads were actually alignable. To validate these observations, we investigated the characteristics of the previously unaligned reads corresponding to TAL1, a human TF involved in lineage specification of hemopoietic cells. We show that, while unmapped ChIP-Seq read datasets contain foreign DNA sequences, additional TFBSs can be identified from the previously unaligned ChIP-Seq reads. Our results indicate that the re-evaluation of previously unaligned reads from ChIP-Seq experiments will significantly contribute to TF target identification and determination of emerging properties of GRNs.
Highlights
Establishing the architecture of gene regulatory networks (GRNs) relies on chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) methods that provide genome-wide transcription factor binding sites (TFBSs)
We investigated the characteristics of the previously unaligned reads corresponding to TAL1, a human transcription factors (TFs) involved in lineage specification of hemopoietic cells
Our results indicate that the re-evaluation of previously unaligned reads from Chromatin immunoprecipitation (ChIP)-Seq experiments will significantly contribute to TF target identification and determination of emerging properties of GRNs
Summary
Establishing the architecture of gene regulatory networks (GRNs) relies on chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) methods that provide genome-wide transcription factor binding sites (TFBSs). Our results indicate that the re-evaluation of previously unaligned reads from ChIP-Seq experiments will significantly contribute to TF target identification and determination of emerging properties of GRNs. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-Seq) allows the in vivo characterization of genome-wide maps of protein-DNA interactions and epigenetic modifications. An in-depth ChIP-Seq analysis of one such potentially legitimate human reads dataset resulted in identification of novel TAL1 binding sites These findings are important because the use of legitimate previously unaligned reads in identifying additional TFBSs results in the discovery of new target genes that could enhance construction of gene regulatory grids and networks. All the sources of bias affecting results from ChIP-Seq data are far from being characterized In this regard, expanding exploration to different portions of the data and/or including information from a wide group of techniques, organisms, and analysis pipelines is likely to uncover other limitations associated with ChIP-Seq experiments
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.