Abstract

In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, “modified nucleosomes neighboring” and “modified nucleosomes occupancy”, to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method for TFBS prediction.

Highlights

  • Gene regulation is affected by the binding of transcription factors (TFs) to regulatory sequences in DNA

  • We have examined the effects of two features ‘‘modified nucleosomes neighboring’’ (MNN) and ‘‘modified nucleosomes occupancy’’ (MNO) around transcription factor binding site (TFBS)

  • Through the evaluation of the MNN feature, eight significant histone modifications were identified for TFBS prediction

Read more

Summary

Introduction

Gene regulation is affected by the binding of transcription factors (TFs) to regulatory sequences in DNA. Recognition of transcription factor binding sites (TFBSs) improves insights into the genes regulated by a TF. Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) [11,12,13,14] and array hybridization (ChIPchip) [15] experiments, are two promising high throughput technologies for identification of TF binding locations [13,15,16,17,18,19,20,21] These technologies have been successfully used to map binding locations in several organisms but some properties of these experiments such as being tissue and condition specific, the availability of antibodies for TFs under study, and the expense of the experiments have made them useful only for a limited number of TFs [2,7,10]. Utilization of computational approaches to identify binding sites seems inevitable [1,2,3,4,5,7,9,10]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call