Abstract

BackgroundPrevious work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by different protocols from different cellular compartments of several human cell lines. ENCODE also generated the genome-wide mapping of eleven histone marks, one histone variant, and DNase I hypersensitivity sites in seven cell lines.ResultsWe built a novel quantitative model to study the relationship between chromatin features and expression levels. Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels. We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy. We also found that expression levels measured by CAGE are better predicted than by RNA-PET or RNA-Seq, and different categories of chromatin features are the most predictive of expression for different RNA measurement methods. Additionally, PolyA+ RNA is overall more predictable than PolyA- RNA among different cell compartments, and PolyA+ cytosolic RNA measured with RNA-Seq is more predictable than PolyA+ nuclear RNA, while the opposite is true for PolyA- RNA.ConclusionsOur study provides new insights into transcriptional regulation by analyzing chromatin features in different cellular contexts.

Highlights

  • Previous work has demonstrated that chromatin feature levels correlate with gene expression

  • Development of a new quantitative model to correlate chromatin features with transcription levels To further understand the relationship between chromatin features and expression levels under various conditions, we took advantage of the massive high-throughput sequencing data from the ENCODE Consortium [12], which includes genomic localization data for eleven histone modifications and one histone variant in seven human cell lines [14], and expression quantification data for various cell compartments and RNA extractions in each corresponding cell line

  • Gene expression levels were quantified in two forms: RNA-Seq [15] was used to quantify transcript (Tx)-based expression levels; and cap analysis of gene expression (CAGE) [16,17] and 5’ tags of RNA paired-end tag (RNA-PET) [18] were used to capture transcription start site (TSS)-based expression levels [19]

Read more

Summary

Introduction

Previous work has demonstrated that chromatin feature levels correlate with gene expression. Wang et al [7] systematically analyzed 39 histone modifications in human CD4+ T cells and found that histone acetylation positively correlates with gene expression, consistent with its role in transcriptional activation. Cheng et al [11] derived a support vector machine model from modENCODE worm data and applied it to human K562 cells and mouse embryonic stem cells with good performance (Pearson’s correlation coefficient (PCC) r = 0.73 and 0.74, respectively). Both studies successfully quantified the relationship between histone modifications and gene expression. Due to the limited human datasets used in these studies (for example, only one cell line and/or no information regarding RNA type), it is still largely unknown if this relationship remains true in other cellular contexts

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call