Abstract

BackgroundIn the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide.ResultsBased on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome).ConclusionThe predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

Highlights

  • In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA

  • Deep learning accurately distinguishes active enhancers and promoters from background We investigated the capacity of deep learning models to separate enhancers and promoters, and to distinguish them from other regions and between activity states

  • Hereafter we refer to active enhancer, active promoter, active exon, inactive enhancer, inactive promoter, inactive exon, and unknown region as Active enhancer (A-E), Active promoter (A-P), Active exon (A-X), Inactive enhancer (I-E), Inactive promoter (I-P), Inactive exon (I-X), and UK, respectively

Read more

Summary

Introduction

98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. Non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. We apply deep supervised analysis methods to identify the positions of active cis-regulatory regions (CRRs), including both enhancers and promoters, across the human genome. Promoters and enhancers act via complex interactions across time and space in the nucleus to control when, where and at what magnitude genes are active.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call