Discovering epistatic feature interactions from neural network models of regulatory DNA sequences.

Peyton Greenside,Anshul Kundaje,Polly Fordyce,Tyler Shimko

doi:10.1093/bioinformatics/bty575

Peyton Greenside, Anshul Kundaje + Show 2 more

Open Access

https://doi.org/10.1093/bioinformatics/bty575

Copy DOI

Abstract

MotivationTranscription factors bind regulatory DNA sequences in a combinatorial manner to modulate gene expression. Deep neural networks (DNNs) can learn the cis-regulatory grammars encoded in regulatory DNA sequences associated with transcription factor binding and chromatin accessibility. Several feature attribution methods have been developed for estimating the predictive importance of individual features (nucleotides or motifs) in any input DNA sequence to its associated output prediction from a DNN model. However, these methods do not reveal higher-order feature interactions encoded by the models.ResultsWe present a new method called Deep Feature Interaction Maps (DFIM) to efficiently estimate interactions between all pairs of features in any input DNA sequence. DFIM accurately identifies ground truth motif interactions embedded in simulated regulatory DNA sequences. DFIM identifies synergistic interactions between GATA1 and TAL1 motifs from in vivo TF binding models. DFIM reveals epistatic interactions involving nucleotides flanking the core motif of the Cbf1 TF in yeast from in vitro TF binding models. We also apply DFIM to regulatory sequence models of in vivo chromatin accessibility to reveal interactions between regulatory genetic variants and proximal motifs of target TFs as validated by TF binding quantitative trait loci. Our approach makes significant strides in improving the interpretability of deep learning models for genomics.Availability and implementationCode is available at: https://github.com/kundajelab/dfim.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Genome-wide biochemical profiling experiments have revealed millions of putative regulatory elements in diverse cell states. These massive datasets have spurred the development of deep neural network (DNN) models to predict cell-type specific or context-specific molecular phenotypes such as TF binding, chromatin accessibility and gene expression from DNA sequence (Alipanahi et al, 2015; Kelley et al, 2016; Zhou and Troyanskaya, 2015)
A perturbation-based, forward-propagation approach known as in-silico mutagenesis (ISM) quantifies the importance of a nucleotide in an input DNA sequence as the maximal change in the output prediction from the DNN model when the observed nucleotide at that position is mutated to any of the alternative bases (e.g. A, C or T)
We present an efficient approach called Deep Feature Interaction Maps (DFIM) to estimate pairwise interactions between features in an input DNA sequence mapped to an associated regulatory phenotype by a neural network

Summary

Introduction

Genome-wide biochemical profiling experiments have revealed millions of putative regulatory elements in diverse cell states These massive datasets have spurred the development of deep neural network (DNN) models to predict cell-type specific or context-specific molecular phenotypes such as TF binding, chromatin accessibility and gene expression from DNA sequence (Alipanahi et al, 2015; Kelley et al, 2016; Zhou and Troyanskaya, 2015). The primary appeal of DNNs is that they are capable of learning predictive sequence features and modeling non-linear feature interactions directly from raw DNA sequence without any prior assumptions. Interpreting these purported black box models could reveal novel insights into the combinatorial regulatory code.

Objectives

Methods

Results

Conclusion