Abstract

BackgroundHigh-throughput in vivo protein-DNA interaction experiments are currently widely used in gene regulation studies. Hitherto, comprehensive data analysis remains a challenge and for that reason most computational methods only consider the top few hundred or thousand strongest protein binding sites whereas weak protein binding sites are completely ignored.ResultsA new biophysical model of protein-DNA interactions, BayesPI2+, was developed to address the above-mentioned challenges. BayesPI2+ can be run in either a serial computation model or a parallel ensemble learning framework. BayesPI2+ allowed us to analyze all binding sites of the transcription factors, including weak binding that cannot be analyzed by other models. It is evaluated in both synthetic and real in vivo protein-DNA binding experiments. Analysing ESR1 and SPIB in breast carcinoma and activated B cell-like diffuse large B-cell lymphoma cell lines, respectively, revealed that the concerted binding to high and low affinity sites correlates best with gene expression.ConclusionsBayesPI2+ allows us to analyze transcription factor binding on a larger scale than hitherto achieved. By this analysis, we were able to demonstrate that genes are regulated by concerted binding to high and low affinity binding sites. The program and output results are publicly available at: http://folk.uio.no/junbaiw/BayesPI2Plus.

Highlights

  • High-throughput in vivo protein-DNA interaction experiments are currently widely used in gene regulation studies

  • Our analysis showed that the binding of transcription factor (TF) to both type I and type II binding sites is important for gene expression

  • The rates were estimated by comparing to known direct TF binding sites with the predicted direct TF binding sites (i.e. identified by applying fuzzy neural gas algorithm on differential binding affinity)

Read more

Summary

Introduction

High-throughput in vivo protein-DNA interaction experiments are currently widely used in gene regulation studies. High-throughput in vivo protein-DNA binding experiments such as ChIP-chip and ChIP-seq are currently widely used to study gene regulation. Identification of transcription factor (TF) binding sites is an essential step to understand TF function and gene regulatory networks [1]. In such analyses, raw reads of ChIP-seq experiments are mapped to a human reference genome. A peak-calling program is used to detect putative TF binding sites. The identified TF binding sites are dependent of the threshold value used by the peak-calling program

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.