Abstract

Transcription factor binding site (TFBS) identification plays an important role in deciphering gene regulatory codes. With comprehensive knowledge of TFBSs, one can understand molecular mechanisms of gene regulation. In the recent decades, various computational approaches have been proposed to predict TFBSs in the genome. The TFBS dataset of a TF generated by each algorithm is a ranked list of predicted TFBSs of that TF, where top ranked TFBSs are statistically significant ones. However, whether these statistically significant TFBSs are functional (i.e. biologically relevant) is still unknown. Here we develop a post-processor, called the functional propensity calculator (FPC), to assign a functional propensity to each TFBS in the existing computationally predicted TFBS datasets. It is known that functional TFBSs reveal strong positional preference towards the transcriptional start site (TSS). This motivates us to take TFBS position relative to the TSS as the key idea in building our FPC. Based on our calculated functional propensities, the TFBSs of a TF in the original TFBS dataset could be reordered, where top ranked TFBSs are now the ones with high functional propensities. To validate the biological significance of our results, we perform three published statistical tests to assess the enrichment of Gene Ontology (GO) terms, the enrichment of physical protein-protein interactions, and the tendency of being co-expressed. The top ranked TFBSs in our reordered TFBS dataset outperform the top ranked TFBSs in the original TFBS dataset, justifying the effectiveness of our post-processor in extracting functional TFBSs from the original TFBS dataset. More importantly, assigning functional propensities to putative TFBSs enables biologists to easily identify which TFBSs in the promoter of interest are likely to be biologically relevant and are good candidates to do further detailed experimental investigation. The FPC is implemented as a web tool at http://santiago.ee.ncku.edu.tw/FPC/.

Highlights

  • Cells respond to internal or external stimuli by changing their gene expression [1,2], a process that a gene is transcribed by the RNA polymerase into an mRNA to convey information for ribosomes to synthesize proteins

  • Our results are robust against different numbers of the required highconfidence transcription factor (TF) binding site (TFBS)

  • To prove that our post-processor is effective in extracting functional TFBSs from the original TFBS dataset, we must show that the top ranked TFBSs in our reordered TFBS dataset are more likely to be functional than the top ranked TFBSs in the original TFBS dataset are

Read more

Summary

Introduction

Cells respond to internal or external stimuli by changing their gene expression [1,2], a process that a gene is transcribed by the RNA polymerase into an mRNA to convey information for ribosomes to synthesize proteins. Researchers committed to developing experimental or computational approaches to identify TFBSs. Some examples of the experimental approaches were the works carried out by experimental biologists [3,4,5]. On the other hand, based on the partial conservation property of TFBS nucleotide sequences, computational approaches were applied to the identification of TFBSs. The most well-known classical method used position weight matrices (PWMs), carrying the frequencies and the variability of four nucleotides at each position for a DNA sequence in a quantitative manner. The most well-known classical method used position weight matrices (PWMs), carrying the frequencies and the variability of four nucleotides at each position for a DNA sequence in a quantitative manner This approach searched for consensus sequences in the PWM and assumed that each nucleotide interacts with the TF independently [6,7]. A number of techniques or factors are considered in the literature: (i) extracting maximal sequence from experimentally identified binding sites to improve motif models [14], (ii) supplementing motif models with more genomic attributes such as evolutionary conservation [14], and (iii) considering the similarity of the gene expression profiles of a TF and its target genes as extra information [15]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call