A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

Brian W Busser,Leila Taher,Terese Tansey,Alan M Michelson,Yongsok Kim,Molly J Bloom,Ivan Ovcharenko,James W Posakony

doi:10.1371/journal.pgen.1002531

Abstract

Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns.

Highlights

Complex spatio-temporal gene expression programs guide the progressive determination of pluripotent cells allowing cell fates to become sequentially restricted during embryonic development
This study is composed of 4 main components: (1) compiling a training set of founder cells (FCs) enhancers from multiple sources including the literature, testing of additional computational predictions from a previous study [5], increasing the size of the dataset through phylogenetic profiling, including the empirical validation of a subset of those predictions; (2) machine learning on the FC enhancer training set; (3) experimental validation of classifier predictions using transgenic reporter assays and whole embryo in situ hybridization with genespecific probes; and (4) functional examination of sequence features associated with the computational classification to define novel motifs and transcription factors (TFs) regulating myogenesis
We used the information derived from the abovementioned studies to examine the distribution of TF binding sites (TFBSs) across the entire set of known FC enhancers to ascertain the extent to which TF combinatorics contributes to the diversity of FC enhancer activities

Summary

Introduction

Complex spatio-temporal gene expression programs guide the progressive determination of pluripotent cells allowing cell fates to become sequentially restricted during embryonic development These transitions in cell fate are encoded in the genome by cis regulatory DNA sequences such as transcriptional enhancers. Several groups have identified enhancers based on the presence of shared sequence features without the necessity of knowing the co-regulating TFs or their binding motifs [6,7,8,9,10,11,12] These enhancer modeling approaches generally take advantage of two data sources: (1) the non-coding sequences surrounding the members of a gene set of interest, or a set of previously validated enhancers associated with such genes; and (2) previously described sequence motifs from transcription factor binding site (TFBS) libraries and/or de novo motif discovery. A particular transcriptional regulatory model can be validated by assaying the functionality of the motifs that are found to be relevant for making predictions, and subsequently by identifying the DNA binding proteins that target these sequences

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS Genetics	Publication Date: Mar 8, 2012
Citations: 140	License type: CC0 1.0

R Discovery Prime

R Discovery Prime

A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Genetics

Lead the way for us

Similar Papers

Identification and characterization of cis-regulatory elements that target Polycomb in the mouse genome

-

01 Jan 2014
01 Jan 2014

Epigenome plasticity during cellular differentiation

-

01 Jan 2009
01 Jan 2009

Decision letter: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
Eduardo Eyras ... George H Perry
-
Eduardo Eyras, et. al.Eduardo Eyras ... George H Perry
07 Sep 2022
07 Sep 2022

Author response: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
Hjörleifur Einarsson ... Christian Vaagensø
-
Hjörleifur Einarsson, et. al.Hjörleifur Einarsson ... Christian Vaagensø
03 Nov 2022
03 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Genetics