Abstract

Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. However, accurately identifying sub-compartments from chromatin interaction data remains a challenge in computational biology. Here, we present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. We find that the network topological centrality and clustering performance of SCI sub-compartment predictions are superior to those of hidden Markov model (HMM) sub-compartment predictions. Moreover, using orthogonal Chromatin Interaction Analysis by in-situ Paired-End Tag Sequencing (ChIA-PET) data, we confirmed that SCI sub-compartment prediction outperforms HMM. We show that SCI-predicted sub-compartments have distinct epigenetic marks, transcriptional activities, and transcription factor enrichment. Moreover, we present a deep neural network to predict sub-compartments using epigenome, replication timing, and sequence data. Our neural network predicts more accurate sub-compartment predictions when SCI-determined sub-compartments are used as labels for training.

Highlights

  • Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus

  • We found that ChIP-seq-based transcription factors (TFs) enrichment patterns agree with the predicted sub-compartment-specific TFs, where MYC and BRCA1 are enriched in C1, and BATF and EP300 are enriched in C2 (Fig. 4c)

  • Software exists for data pre-processing, chromatin loop calling, and topologically associating domain (TAD) predictions

Read more

Summary

Introduction

Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. We present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. The latest model proposes five subcompartments based on the use of hidden Markov modeling (HMM) to cluster inter-chromosomal interactions[5] This method identifies two active sub-compartments (A1 and A2) and three inactive sub-compartments (B1, B2, and B3) using deeply sequenced GM12878 cell line data. TSA-Seq, a new genome-wide mapping method that estimates mean distances of chromosomal loci from nuclear structures, was used to predict several Mbp chromosome trajectories between nuclear structures[7] These findings use subcompartment assignments either for predictive model output labels or for evaluation of the distances between nuclear speckles. Using SCI output, we developed an epigenome-based (DNA methylation and histone modification) deep neural network model for subcompartment classification

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.