Abstract

Chromatin accessibility is a highly informative structural feature for understanding gene transcription regulation, because it indicates the degree to which nuclear macromolecules such as proteins and RNAs can access chromosomal DNA. Studies have shown that chromatin accessibility is highly dynamic during stress response, stimulus response, and developmental transition. Moreover, physical access to chromosomal DNA in eukaryotes is highly cell-specific. Therefore, current technologies such as DNase-seq, ATAC-seq, and FAIRE-seq reveal only a portion of the open chromatin regions (OCRs) present in a given species. Thus, the genome-wide distribution of OCRs remains unknown. In this study, we developed a bioinformatics tool called CharPlant for the de novo prediction of OCRs in plant genomes. To develop this tool, we constructed a three-layer convolutional neural network (CNN) and subsequently trained the CNN using DNase-seq and ATAC-seq datasets of four plant species. The model simultaneously learns the sequence motifs and regulatory logics, which are jointly used to determine DNA accessibility. All of these steps are integrated into CharPlant, which can be run using a simple command line. The results of data analysis using CharPlant in this study demonstrate its prediction power and computational efficiency. To our knowledge, CharPlant is the first de novo prediction tool that can identify potential OCRs in the whole genome. The source code of CharPlant and supporting files are freely available from https://github.com/Yin-Shen/CharPlant.

Highlights

  • In eukaryotic genomes, most of the chromatin is tightly coiled in the nucleus, but some regions, known as open chromatin regions (OCRs) or chromatin accessible 55 regions, are loosely formed after chromatin remodeling

  • A number of cis-regulatory elements interact with trans-acting factors for transcriptional regulation, and cis-trans elements with regulatory functions participate in the process of transcriptional regulation by binding to OCRs [3,4]

  • 280 3.1 Motifs identified by CharPlant The positive dataset was obtained from the peaks of ATAC-seq and DNase-seq data, and the negative samples were generated by shuffling the positive samples, as described above

Read more

Summary

Introduction

Most of the chromatin is tightly coiled in the nucleus, but some regions, known as open chromatin regions (OCRs) or chromatin accessible 55 regions, are loosely formed after chromatin remodeling. Whether the chromatin is loosely or tightly coiled largely determines transcriptional regulation [1,2]. A number of cis-regulatory elements interact with trans-acting factors for transcriptional regulation, and cis-trans elements with regulatory functions participate in the process of transcriptional regulation by binding to OCRs [3,4]. When a 60 transcription factor binds to an OCR, it recruits other proteins to initiate the transcription of nearby genes. A complete genome-wide map of potential open chromatin loci is helpful for the investigation of changes in the nucleosome location and for the discovery of genome regulatory elements and gene regulatory mechanisms [5,6]. Chromatin accessibility information has even been proven to be 65 valuable for the early diagnosis and treatment of cancer [7,8]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call