A graphical modelling approach to the dissection of highly correlated transcription factor binding site profiles.

Robert Stojnic,Audrey Qiuyan Fu,Boris Adryan

doi:10.1371/journal.pcbi.1002725

Robert Stojnic, Audrey Qiuyan Fu + Show 1 more

Open Access

https://doi.org/10.1371/journal.pcbi.1002725

Copy DOI

Journal: PLoS computational biology	Publication Date: Nov 8, 2012
Citations: 51	License type: CC BY 4.0

Affiliation: University of Cambridge

Abstract

Inferring the combinatorial regulatory code of transcription factors (TFs) from genome-wide TF binding profiles is challenging. A major reason is that TF binding profiles significantly overlap and are therefore highly correlated. Clustered occurrence of multiple TFs at genomic sites may arise from chromatin accessibility and local cooperation between TFs, or binding sites may simply appear clustered if the profiles are generated from diverse cell populations. Overlaps in TF binding profiles may also result from measurements taken at closely related time intervals. It is thus of great interest to distinguish TFs that directly regulate gene expression from those that are indirectly associated with gene expression. Graphical models, in particular Bayesian networks, provide a powerful mathematical framework to infer different types of dependencies. However, existing methods do not perform well when the features (here: TF binding profiles) are highly correlated, when their association with the biological outcome is weak, and when the sample size is small. Here, we develop a novel computational method, the Neighbourhood Consistent PC (NCPC) algorithms, which deal with these scenarios much more effectively than existing methods do. We further present a novel graphical representation, the Direct Dependence Graph (DDGraph), to better display the complex interactions among variables. NCPC and DDGraph can also be applied to other problems involving highly correlated biological features. Both methods are implemented in the R package ddgraph, available as part of Bioconductor (http://bioconductor.org/packages/2.11/bioc/html/ddgraph.html). Applied to real data, our method identified TFs that specify different classes of cis-regulatory modules (CRMs) in Drosophila mesoderm differentiation. Our analysis also found depletion of the early transcription factor Twist binding at the CRMs regulating expression in visceral and somatic muscle cells at later stages, which suggests a CRM-specific repression mechanism that so far has not been characterised for this class of mesodermal CRMs.

Highlights

A major area in genome research is understanding how the regulatory information is encoded
Work over the past few decades has resulted in the notion of a combinatorial regulatory code: the concerted binding of a context-specific set of transcription factors (TFs) to regulatory sequences, which is crucial for proper gene expression
Transcription factors (TFs) are proteins that bind to DNA and regulate gene expression

Summary

Introduction

A major area in genome research is understanding how the regulatory information is encoded. A canonical example of this traditional dissection is the identification of the various stripe enhancers of the Drosophila even-skipped gene that respond to different TFs involved in early patterning (see [2,3] for review). Whereas the inference of the regulatory code may greatly benefit from having additional data, such as the expression patterns of the genes of interest under mutant conditions, it is often difficult to collect at the genome level. Recent studies provide evidence for so-called ‘‘hotspots’’ to which many interacting or non-interacting TFs may bind [6,8,9], which leads to high correlations among binding profiles of both functionally ‘‘relevant’’ and functionally ‘‘irrelevant’’ TFs. It remains a significant challenge to distinguish relevant and important TFs from the others in the understanding of the combinatorial regulatory code

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A graphical modelling approach to the dissection of highly correlated transcription factor binding site profiles.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology

Lead the way for us

Similar Papers

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.
Anthony Mathelier ... Chih-Yu Chen
Nucleic Acids Research | VOL. 44
Anthony Mathelier, et. al.Anthony Mathelier ... Chih-Yu Chen
03 Nov 2015
JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.
Anthony Mathelier ... Chih-Yu Chen

De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.
Meng Niu ... Ehsan S Tabari
BMC Genomics | VOL. 15
Meng Niu, et. al.Meng Niu ... Ehsan S Tabari
01 Dec 2014
BMC Genomics | VOL. 15

BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells
Diego Chacon ... Dominik Beck
Nucleic Acids Research | VOL. 42
Diego Chacon, et. al.Diego Chacon ... Dominik Beck
31 Oct 2013
Nucleic Acids Research | VOL. 42

BloodChIP Xtra: an expanded database of comparative genome-wide transcription factor binding and gene-expression profiles in healthy human stem/progenitor subsets and leukemic cells.
Julie A I Thoms ... John E Pimanda
Nucleic Acids Research | VOL. 52
Julie A I Thoms, et. al.Julie A I Thoms ... John E Pimanda
23 Oct 2023
Nucleic Acids Research | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A graphical modelling approach to the dissection of highly correlated transcription factor binding site profiles.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology