Abstract

BackgroundMultiplex experimental assays coupled to computational predictions are being increasingly employed for the simultaneous analysis of many specimens at the genome scale, which quickly generates very large amounts of data. However, inferring valuable biological information from the comparisons of very large genomic datasets still represents an enormous challenge.ResultsAs a study model, we chose the NFI/CTF family of mammalian transcription factors and we compared the results obtained from a genome-wide study of its binding sites with chromatin structure assays, gene expression microarray data, and in silico binding site predictions. We found that NFI/CTF family members preferentially bind their DNA target sites when they are located around transcription start sites when compared to control datasets generated from the random subsampling of the complete set of NFI binding sites. NFI proteins preferably associate with the upstream regions of genes that are highly expressed and that are enriched in active chromatin modifications such as H3K4me3 and H3K36me3. We postulate that this is a causal association and that NFI proteins mainly act as activators of transcription. This was documented for one member of the family (NFI-C), which revealed as a more potent gene activator than repressor in global gene expression analysis. Interestingly, we also discovered the association of NFI with the tri-methylation of lysine 9 of histone H3, a chromatin marker previously associated with the protection against silencing of telomeric genes by NFI.ConclusionTaken together, we illustrate approaches that can be taken to analyze large genomic data, and provide evidence that NFI family members may act in conjunction with specific chromatin modifications to activate gene expression.

Highlights

  • Multiplex experimental assays coupled to computational predictions are being increasingly employed for the simultaneous analysis of many specimens at the genome scale, which quickly generates very large amounts of data

  • A new high-throughput method has been developed for a whole genome mapping of protein-DNA interactions that is based on the chromatin immunoprecipitation and generation sequencing technology [3,4,5,6,7,8]

  • We devised a random sampling algorithm that can be conveniently applied to large genomic datasets

Read more

Summary

Introduction

Multiplex experimental assays coupled to computational predictions are being increasingly employed for the simultaneous analysis of many specimens at the genome scale, which quickly generates very large amounts of data. A new high-throughput method has been developed for a whole genome mapping of protein-DNA interactions that is based on the chromatin immunoprecipitation and generation sequencing technology (method termed chromatin immunoprecipitation sequencing or ChIP-Seq) [3,4,5,6,7,8]. A position weight matrix for the NFI/CTF was established using a collection of over 10,000 SELEX-SAGE selected sites, allowing the prediction of its binding affinity to any genomic sequence [28] Since this prediction matrix is based on NFI binding specificity in vitro, the specificity of this family of proteins may be different from that observed in the cell, where interactions with other transcription factors may take place and DNA accessibility may be restrained by chromatin. We assessed the in vivo binding preferences of NFI/CTF, its global functional properties regarding the regulation of gene expression and the relationship of NFI binding sites with different histone methylation markers typical of either an open or closed chromatin structure

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call