Abstract

BackgroundChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome. However, to maximize the effectiveness of data analysis of such sequences requires the development of new algorithms that are able to accurately predict DNA-protein binding sites.ResultsHere, we present SIPeS (Site Identification from Paired-end Sequencing), a novel algorithm for precise identification of binding sites from short reads generated by paired-end solexa ChIP-Seq technology. In this paper we used ChIP-Seq data from the Arabidopsis basic helix-loop-helix transcription factor ABORTED MICROSPORES (AMS), which is expressed within the anther during pollen development, the results show that SIPeS has better resolution for binding site identification compared to two existing ChIP-Seq peak detection algorithms, Cisgenome and MACS.ConclusionsWhen compared to Cisgenome and MACS, SIPeS shows better resolution for binding site discovery. Moreover, SIPeS is designed to calculate the mappable genome length accurately with the fragment length based on the paired-end reads. Dynamic baselines are also employed to effectively discriminate closely adjacent binding sites, for effective binding sites discovery, which is of particular value when working with high-density genomes.

Highlights

  • ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome

  • Using the preprocessing program of SIPeS, the effective genome size, which is the genome coverage calculated based on uniquely mapped reads, within the Arabidopsis genome is 111,755,668 bp, which accounts for about 93% of the whole genome length in our ABORTED MICROSPORES (AMS) experiment

  • In this paper we present an algorithm SIPeS that can be used for calculation of the effective genome size and precise identification of binding sites from short reads generated from paired-end solexa ChIP-Seq technology

Read more

Summary

Introduction

ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome. To maximize the effectiveness of data analysis of such sequences requires the development of new algorithms that are able to accurately predict DNA-protein binding sites. ChIP-Seq, which combines ChIP with massively parallel sequencing, offers a new genome-wide approach to extensively determine chromosome binding sites of DNA-associated proteins. The massive amounts of data generated from the high-throughput sequencing pose great challenges for the identification of protein binding sites. The double-end reads can be used for more precise identification of each corresponding DNA fragment; the paired-end sequencing data has the potential to increase the accuracy of identification of chromosome binding sites of DNA-associated proteins because the fragment length as well as the effective genome length can be computed accurately

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.