An effective approach for identification of in vivo protein-DNA binding sites from paired-end ChIP-Seq data

Congmao Wang,Dasheng Zhang,Zoe A Wilson,Dabing Zhang,Jie Xu

doi:10.1186/1471-2105-11-81

Congmao Wang, Dasheng Zhang + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-11-81

Copy DOI

Abstract

BackgroundChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome. However, to maximize the effectiveness of data analysis of such sequences requires the development of new algorithms that are able to accurately predict DNA-protein binding sites.ResultsHere, we present SIPeS (Site Identification from Paired-end Sequencing), a novel algorithm for precise identification of binding sites from short reads generated by paired-end solexa ChIP-Seq technology. In this paper we used ChIP-Seq data from the Arabidopsis basic helix-loop-helix transcription factor ABORTED MICROSPORES (AMS), which is expressed within the anther during pollen development, the results show that SIPeS has better resolution for binding site identification compared to two existing ChIP-Seq peak detection algorithms, Cisgenome and MACS.ConclusionsWhen compared to Cisgenome and MACS, SIPeS shows better resolution for binding site discovery. Moreover, SIPeS is designed to calculate the mappable genome length accurately with the fragment length based on the paired-end reads. Dynamic baselines are also employed to effectively discriminate closely adjacent binding sites, for effective binding sites discovery, which is of particular value when working with high-density genomes.

Highlights

ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome
Using the preprocessing program of SIPeS, the effective genome size, which is the genome coverage calculated based on uniquely mapped reads, within the Arabidopsis genome is 111,755,668 bp, which accounts for about 93% of the whole genome length in our ABORTED MICROSPORES (AMS) experiment
In this paper we present an algorithm SIPeS that can be used for calculation of the effective genome size and precise identification of binding sites from short reads generated from paired-end solexa ChIP-Seq technology

Summary

Introduction

ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome. To maximize the effectiveness of data analysis of such sequences requires the development of new algorithms that are able to accurately predict DNA-protein binding sites. ChIP-Seq, which combines ChIP with massively parallel sequencing, offers a new genome-wide approach to extensively determine chromosome binding sites of DNA-associated proteins. The massive amounts of data generated from the high-throughput sequencing pose great challenges for the identification of protein binding sites. The double-end reads can be used for more precise identification of each corresponding DNA fragment; the paired-end sequencing data has the potential to increase the accuracy of identification of chromosome binding sites of DNA-associated proteins because the fragment length as well as the effective genome length can be computed accurately

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 9, 2010
Citations: 45	License type: cc-by

R Discovery Prime

R Discovery Prime

An effective approach for identification of in vivo protein-DNA binding sites from paired-end ChIP-Seq data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Genome-wide identification ofin vivoprotein-DNA binding sites from ChIP-Seq data
Raja Jothi ... Kairong Cui
Nucleic Acids Research | VOL. 36
Raja Jothi, et. al.Raja Jothi ... Kairong Cui
06 Aug 2008
Nucleic Acids Research | VOL. 36

Functions of ANAC092 involved in regulation of anther development in Arabidopsis thaliana
Jie Li ... Feng Ming
Hereditas (Beijing) | VOL. 35
Jie Li, et. al.Jie Li ... Feng Ming
30 Sep 2013
Hereditas (Beijing) | VOL. 35

The ABORTED MICROSPORES Regulatory Network Is Required for Postmeiotic Male Reproductive Development in Arabidopsis thaliana
Jie Xu ... Martha Y Gondwe
The Plant Cell | VOL. 22
Jie Xu, et. al.Jie Xu ... Martha Y Gondwe
29 Jan 2010
The Plant Cell | VOL. 22

Biphasic regulation of the transcription factor ABORTED MICROSPORES (AMS) is essential for tapetum and pollen development in Arabidopsis.
Alison C Ferguson ... Caiyun Yang
New Phytologist | VOL. 213
Alison C Ferguson, et. al.Alison C Ferguson ... Caiyun Yang
27 Oct 2016
New Phytologist | VOL. 213

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An effective approach for identification of in vivo protein-DNA binding sites from paired-end ChIP-Seq data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics