Abstract

The nature of genome organization into two basic structural compartments is as yet undiscovered. However, it has been indicated to be a mechanism of gene expression regulation. Using the classification approach, we ranked genomic marks that hint at compartmentalization. We considered a broad range of marks, including GC content, histone modifications, DNA binding proteins, open chromatin, transcription and genome regulatory segmentation in GM12878 cells. Genomic marks were defined over CTCF or RNAPII loops, which are basic elements of genome 3D structure, and over 100 kb genomic windows. Experiments were carried out to empirically assess the whole set of features, as well as the individual features in classification of loops/windows, into compartment A or B. Using Monte Carlo Feature Selection and Analysis of Variance, we constructed a ranking of feature importance for classification. The best simple indicator of compartmentalization is DNase-seq open chromatin measurement for CTCF loops, H3K4me1 for RNAPII loops and H3K79me2 for genomic windows. Among DNA binding proteins, this is RUNX3 transcription factor for loops and RNAPII for genomic windows. Chromatin state prediction methods that indicate active elements like promoters, enhancers or heterochromatin enhance the prediction of loop segregation into compartments. However, H3K9me3, H4K20me1, H3K27me3 histone modifications and GC content poorly indicate compartments.

Highlights

  • The genetic information of eukaryotes is stored in a cell nucleus in the form of a nucleoprotein complex of DNA and histones, known as chromatin

  • We collected a range of genomic data that characterize chromatin state and chromatin binding proteins in this cell line

  • The most valuable representation of these features for loops is by calculation of the fraction of the loop that is covered by chromatin immunoprecipitation (ChIP)-seq peak and for genomic windows by calculation of the mean of the whole signal

Read more

Summary

Introduction

The genetic information of eukaryotes is stored in a cell nucleus in the form of a nucleoprotein complex of DNA and histones, known as chromatin. A basic aspect of chromatin structure is that each chromosome occupies a discrete volume, forming a “chromosome territory” [1]. Proximity-based ligation techniques coupled with massively parallel sequencing (Hi-C) have provided evidence for topologically associating domains (TADs). A. TAD is defined as a region of a chromosome that shares many interactions within it, but significantly fewer interactions with the adjacent and other more distal TADs [2,3]. The concept of TADs that are of a size ~ 1 Mbp is in concordance with microscopic evidence [4]. The Hi-C map of the human genome at a higher, kilo-base resolution reveals the inner structure of TADs [5]. The observed domains ranged in size from 40 kb to 3 Mb

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.