Abstract

Promoters and enhancers are well-known regulatory elements modulating gene expression. As confirmed by high-throughput sequencing technologies, these regulatory elements are bidirectionally transcribed. That is, promoters produce stable mRNA in the sense direction and unstable RNA in the antisense direction, while enhancers transcribe unstable RNA in both directions. Although it is thought that enhancers and promoters share a similar architecture of transcription start sites (TSSs), how the transcriptional machinery distinctly uses these genomic regions as promoters or enhancers remains unclear. To address this issue, we developed a deep learning (DL) method by utilizing a convolutional neural network (CNN) and the saliency algorithm. In comparison with other classifiers, our CNN presented higher predictive performance, suggesting the overarching importance of the high-order sequence features, captured by the CNN. Moreover, our method revealed that there are substantial sequence differences between the enhancers and promoters. Remarkably, the 20–120 bp downstream regions from the center of bidirectional TSSs seemed to contribute to the RNA stability. These regions in promoters tend to have a larger number of guanines and cytosines compared to those in enhancers, and this feature contributed to the classification of the regulatory elements. Our CNN-based method can capture the complex TSS architectures. We found that the genomic regions around TSSs for promoters and enhancers contribute to RNA stability and show GC-biased characteristics as a critical determinant for promoter TSSs.

Highlights

  • Promoters are defined as DNA regions where transcription is initiated (Lenhard et al, 2012; Haberle and Stark, 2018)

  • To characterize the promoter and enhancer regions, we first prepared publicly available annotations, which have been based on GRO-seq (Core et al, 2008) and CAGE tags (Harbers and Carninci, 2005)

  • We could define the bidirectional UU transcription start sites (TSSs) pairs and US TSS pairs, which correspond to enhancer and promoter regions (Figure 1A)

Read more

Summary

INTRODUCTION

Promoters are defined as DNA regions where transcription is initiated (Lenhard et al, 2012; Haberle and Stark, 2018). The enhancer sequences, distal from their target promoters, contain DNA motifs that act as binding sites for TFs and cofactors. These historical definitions are dichotomic, which means that promoters and enhancers are distinct regulatory elements. A hidden Markov model incorporating these motifs predicted the transcript stability at a relatively low accuracy (63%) (Core et al, 2014). As another approach, a support vector machine (SVM) with hexamer nucleotides improved the separation of promoters from enhancers, identified by the FANTOM consortium (AUC 0.86) (Colbran et al, 2019). To characterize TSS architectures that are indispensable for the distinctive regulatory activities, we employed the saliency map (Simonyan, 2013), extracting the impactful features

MATERIALS AND METHODS
RESULTS
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call