Abstract

BackgroundChIPx (i.e., ChIP-seq and ChIP-chip) is increasingly used to map genome-wide transcription factor (TF) binding sites. A single ChIPx experiment can identify thousands of TF bound genes, but typically only a fraction of these genes are functional targets that respond transcriptionally to perturbations of TF expression. To identify promising functional target genes for follow-up studies, researchers usually collect gene expression data from TF perturbation experiments to determine which of the TF targets respond transcriptionally to binding. Unfortunately, approximately 40% of ChIPx studies do not have accompanying gene expression data from TF perturbation experiments. For these studies, genes are often prioritized solely based on the binding strengths of ChIPx signals in order to choose follow-up candidates. ChIPXpress is a novel method that improves upon this ChIPx-only ranking approach by integrating ChIPx data with large amounts of Publicly available gene Expression Data (PED).ResultsWe demonstrate that PED does contain useful information to identify functional TF target genes despite its inherent heterogeneity. A truncated absolute correlation measure is developed to better capture the regulatory relationships between TFs and their target genes in PED. By integrating the information from ChIPx and PED, ChIPXpress can significantly increase the chance of finding functional target genes responsive to TF perturbation among the top ranked genes. ChIPXpress is implemented as an easy-to-use R/Bioconductor package. We evaluate ChIPXpress using 10 different ChIPx datasets in mouse and human and find that ChIPXpress rankings are more accurate than rankings based solely on ChIPx data and may result in substantial improvement in prediction accuracy, irrespective of which peak calling algorithm is used to analyze the ChIPx data.ConclusionsChIPXpress provides a new tool to better prioritize TF bound genes from ChIPx experiments for follow-up studies when investigators do not have their own gene expression data. It demonstrates that the regulatory information from PED can be used to boost ChIPx data analyses. It also represents an important step towards more fully utilizing the valuable, but highly heterogeneous data contained in public gene expression databases.

Highlights

  • ChIPx (i.e., ChIP-seq and ChIP-chip) is increasingly used to map genome-wide transcription factor (TF) binding sites

  • We show that by using Publicly available gene Expression Data (PED) in its entirety, ChIPXpress makes it possible to improve functional target gene identification even when the gene expression data from the matching cell types and biological conditions are unavailable in public gene expression databases

  • The normalized expression values of the same gene can be meaningfully compared across samples in spite of their heterogeneous origins [14]. This is consistent with similar observations made by others [15]. Our exploration of these two PED compendiums further confirmed these observations, as we found that TFs and their known functional target genes (TG) are often highly correlated across compendium samples despite their diverse lab and cell type origins

Read more

Summary

Introduction

ChIPx (i.e., ChIP-seq and ChIP-chip) is increasingly used to map genome-wide transcription factor (TF) binding sites. ChIPx, including ChIP-seq [1,2] and ChIP-chip [3,4], is a powerful technology for mapping transcription factor binding sites (TFBSs) Biologists often use it as a genome-wide screen to identify promising TF target genes to design follow-up studies or develop mechanistic hypotheses. A survey of 58 published ChIPx studies randomly chosen from the Gene Expression Omnibus (GEO) [6] shows that around 40% (24/58) of the existing ChIPx studies do not have accompanying gene expression data In these cases, investigators usually prioritize TF bound genes according to the strength of the ChIPx binding signals and choose follow-up candidates from the top ranked genes, based on the assumption that the top ranked binding targets are more likely to be functional target genes than the lower ranked binding targets [7]. In order to improve upon this ChIPx-only based approach, we developed ChIPXpress to better identify functional TF target genes among TF binding targets when corresponding TF perturbation data is unavailable

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call