Learning and interpreting the gene regulatory grammar in a deep learning framework.

Ling Chen,John A Capra,Sushmita Roy

doi:10.1371/journal.pcbi.1008334

Ling Chen, John A Capra + Show 1 more

Open Access

https://doi.org/10.1371/journal.pcbi.1008334

Copy DOI

Abstract

Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial binding of transcription factors, i.e. the regulatory grammar, drives enhancer activity have been proposed, ranging from the flexible TF billboard model to the stringent enhanceosome model. However, there is limited knowledge of the prevalence of these (or other) sequence architectures across enhancers. Here we perform several hypothesis-driven analyses to explore the ability of DNNs to learn the regulatory grammar of enhancers. We created synthetic datasets based on existing hypotheses about combinatorial transcription factor binding site (TFBS) patterns, including homotypic clusters, heterotypic clusters, and enhanceosomes, from real TF binding motifs from diverse TF families. We then trained deep residual neural networks (ResNets) to model the sequences under a range of scenarios that reflect real-world multi-label regulatory sequence prediction tasks. We developed a gradient-based unsupervised clustering method to extract the patterns learned by the ResNet models. We demonstrated that simulated regulatory grammars are best learned in the penultimate layer of the ResNets, and the proposed method can accurately retrieve the regulatory grammar even when there is heterogeneity in the enhancer categories and a large fraction of TFBS outside of the regulatory grammar. However, we also identify common scenarios where ResNets fail to learn simulated regulatory grammars. Finally, we applied the proposed method to mouse developmental enhancers and were able to identify the components of a known heterotypic TF cluster. Our results provide a framework for interpreting the regulatory rules learned by ResNets, and they demonstrate that the ability and efficiency of ResNets in learning the regulatory grammar depends on the nature of the prediction task.

Highlights

Enhancers are genomic regions distal to promoters that regulate the dynamic spatiotemporal patterns of gene expression required for the proper differentiation and development of multicellular organisms [1,2,3]
Gene regulatory sequences function through the combinatorial binding of transcription factors (TFs)
We simulated regulatory sequences based on existing hypotheses about the structure of possible regulatory grammars and trained Deep neural networks (DNNs) to model these sequences under a range of scenarios that reflect real-world regulatory sequence prediction tasks

Summary

Introduction

Enhancers are genomic regions distal to promoters that regulate the dynamic spatiotemporal patterns of gene expression required for the proper differentiation and development of multicellular organisms [1,2,3]. Many additional features have been suggested to play a role in determining in vivo TF binding, such as heterogeneity of a TF’s binding motif [11], local DNA properties [12], broader sequence context and interposition dependence [13], cooperative binding of the TF with its partners [14,15,16,17], and condition-specific chromatin context [15, 18, 19] While both genomic and epigenomic features are important in determining the in vivo occupancy of a TF, recent studies have suggested that the epigenome can be accurately predicted from genomic context [12, 20,21,22], supporting the fundamental role of sequence in dictating the binding of TFs [23,24,25,26,27]. It is critical to understand the sequence patterns underlying enhancer regulatory functions and build sufficiently sophisticated models of enhancer sequence architecture

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS computational biology	Publication Date: Nov 2, 2020
Citations: 17	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Learning and interpreting the gene regulatory grammar in a deep learning framework.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology

Lead the way for us

Similar Papers

Decision letter: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
Eduardo Eyras ... George H Perry
-
Eduardo Eyras, et. al.Eduardo Eyras ... George H Perry
07 Sep 2022
07 Sep 2022

Author response: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
Hjörleifur Einarsson ... Christian Vaagensø
-
Hjörleifur Einarsson, et. al.Hjörleifur Einarsson ... Christian Vaagensø
03 Nov 2022
03 Nov 2022

Cooperation between melanoma cell states promotes metastasis through heterotypic cluster formation.
Nathaniel R Campbell ...
Developmental Cell | VOL. 56
Nathaniel R Campbell, et. al.Nathaniel R Campbell ...
01 Oct 2021
Developmental Cell | VOL. 56

Well-Integrity Assessment Across Different Geological Areas by Deriving Insights from Complex Knowledge Base
Amit Priyadarshan ... Ankit Kumar
-
Amit Priyadarshan, et. al.Amit Priyadarshan ... Ankit Kumar
31 Oct 2022
31 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning and interpreting the gene regulatory grammar in a deep learning framework.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology