Abstract

The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.

Highlights

  • Transcription factors (TFs) are proteins that can recognize certain regions of genomic DNA (TF binding sites, transcription factor binding sites (TFBS)) (Lambert et al, 2018)

  • Classification of ChIP-seq peaks based on the results of TFBS recognition by different models

  • The first one takes into account an intersection of positions of predicted TFBS of different models, the second one did not take it into account

Read more

Summary

Introduction

Transcription factors (TFs) are proteins that can recognize certain regions of genomic DNA (TF binding sites, TFBS) (Lambert et al, 2018). The main function of TFs is to increase or decrease a level of gene transcription (Latchman, 2001). The key stage of the regulation of gene expression is TF binding to DNA. This binding initiates a chain of molecular events that ensure the assembly and regulate the activity of the pre-initiation complex of RNA polymerase II, both through direct or indirect contacts with the components of this complex, and through the involvement of various modifying chromatin and remodeling proteins. One of the most important tasks of modern molecular biology is to identify genomic TFBSs

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call