Abstract
We evaluate the feasibility of using a biological sample’s transcriptome to predict its genome-wide regulatory element activities measured by DNase I hypersensitivity (DH). We develop BIRD, Big Data Regression for predicting DH, to handle this high-dimensional problem. Applying BIRD to the Encyclopedia of DNA Elements (ENCODE) data, we found that to a large extent gene expression predicts DH, and information useful for prediction is contained in the whole transcriptome rather than limited to a regulatory element’s neighboring genes. We show applications of BIRD-predicted DH in predicting transcription factor-binding sites (TFBSs), turning publicly available gene expression samples in Gene Expression Omnibus (GEO) into a regulome database, predicting differential regulatory element activities, and facilitating regulome data analyses by serving as pseudo-replicates. Besides improving our understanding of the regulome–transcriptome relationship, this study suggests that transcriptome-based prediction can provide a useful new approach for regulome mapping.
Highlights
We evaluate the feasibility of using a biological sample’s transcriptome to predict its genomewide regulatory element activities measured by DNase I hypersensitivity (DH)
Regulome mapping has been accelerated by high-throughput technologies such as chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq)[1] and sequencing of chromatin accessibility (e.g., DNase-seq[2] for DNase I hypersensitivity (DH), FAIRE-seq[3] for formaldehyde-assisted isolation of regulatory elements, and ATAC-seq[4] for assaying transposaseaccessible chromatin)
After filtering out genomic regions with weak or no DH signal across all 40 training cell types, 912,886 genomic loci with unambiguous DNase-seq signal in at least one training cell type were retained for subsequent analyses (“Methods”)
Summary
We evaluate the feasibility of using a biological sample’s transcriptome to predict its genomewide regulatory element activities measured by DNase I hypersensitivity (DH). Applying BIRD to the Encyclopedia of DNA Elements (ENCODE) data, we found that to a large extent gene expression predicts DH, and information useful for prediction is contained in the whole transcriptome rather than limited to a regulatory element’s neighboring genes. Regulome mapping has been accelerated by high-throughput technologies such as chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq)[1] and sequencing of chromatin accessibility (e.g., DNase-seq[2] for DNase I hypersensitivity (DH), FAIRE-seq[3] for formaldehyde-assisted isolation of regulatory elements, and ATAC-seq[4] for assaying transposaseaccessible chromatin) These technologies have only been applied to interrogate a small subset of all possible biological contexts defined by different combinations of cell or tissue type, disease state, time, environmental stimuli, and other factors. It can be used to predict differential regulatory element activities such as changes of chromatin accessibility between different cell types or differentiation time points
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have