Abstract The three-dimensional (3D) genome organization influences diverse nuclear processes. Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) and High-throughput chromatin conformation capture (Hi-C) are powerful methods used to study the 3D genome organization. However, these two experiments are costly, time-consuming, require tens to hundreds of millions of cells, and challenging to optimize and analyze. Predicting ChIA-PET/Hi-C data using cheaper ChIP-Seq data and other easily obtainable features is a useful alternative. Also, it is well-established that the cohesin protein complex is a key determinant of 3D genome organization. Considering this, we present Chromatin Interaction Predictor (ChIPr), a suite of regression models based on deep neural networks (DNN), random forest, and gradient boosting. ChIPr predicts cohesin-mediated chromatin interaction strength between any two given loci in the genome. It was trained mainly using two top-tiered ENCODE cell lines GM12878 and K562 and tested on other cell lines (H1 and HepG2). Comprehensive tests showed that ChIPr predictions correlated well with the original ChIA-PET data at the peak-level resolution and at bin sizes 25kb and 5kb. In addition, ChIPr accurately captured most of the cell-type-dependent loops identified by both, ChIA-PET and Hi-C data, respectively. Extensive feature testing highlighted genomic distance and RAD21 (a cohesin component) ChIP-Seq signals as the most important inputs contributing to ChIPr, in determining the chromatin interaction strength. A standard ChIPr model requires three experimental inputs: ChIP-Seq signals for RAD21, H3K27ac (enhancer/active chromatin mark), and H3K27me3 (inactive chromatin mark). While a reduced model requires a single experiment input: ChIP-Seq signals for RAD21 and performs equally well. Additionally, the use of ChIPr was further extended to predict the contact maps for several prostate cancer (PCa) cell lines whose ChIA-PET data were not available on the ENCODE portal, for instance, RWPE1 and VCaP. We generated the necessary ChIP-seq data for these cell lines to use as input to the model to further examine the changes in the 3D interactions of PCa driver genes and also their isolated neighbourhoods and decipher useful biological insights about the gene regulatory networks involved. Integrative analysis revealed novel insights into the role of CTCF motif, its orientation, and CTCF binding on the prevalence and strength of cohesin-mediated chromatin interactions (~50% - 80% of the RAD21 interactions were enriched with CTCF, < 15% of the interactions had no CTCF ChIP-seq binding in both of the two peaks). Lastly, we also noticed a subset of RAD21 interactions with CTCF binding in only one or none of the two anchor peaks, which were significantly enriched with enhancers. Taken together, this study outlines the general features of genome folding and opens new avenues to analyze spatial genome organization in specimens with limited cell numbers. Citation Format: Khyati Chandratre. Accurate prediction of cohesin-mediated 3D genome organization using 2D chromatin features [abstract]. In: Proceedings of the AACR Special Conference: Advances in Prostate Cancer Research; 2023 Mar 15-18; Denver, Colorado. Philadelphia (PA): AACR; Cancer Res 2023;83(11 Suppl):Abstract nr A054.
Read full abstract