Abstract
We present a deep-learning package named HiCNN2 to learn the mapping between low-resolution and high-resolution Hi-C (a technique for capturing genome-wide chromatin interactions) data, which can enhance the resolution of Hi-C interaction matrices. The HiCNN2 package includes three methods each with a different deep learning architecture: HiCNN2-1 is based on one single convolutional neural network (ConvNet); HiCNN2-2 consists of an ensemble of two different ConvNets; and HiCNN2-3 is an ensemble of three different ConvNets. Our evaluation results indicate that HiCNN2-enhanced high-resolution Hi-C data achieve smaller mean squared error and higher Pearson’s correlation coefficients with experimental high-resolution Hi-C data compared with existing methods HiCPlus and HiCNN. Moreover, all of the three HiCNN2 methods can recover more significant interactions detected by Fit-Hi-C compared to HiCPlus and HiCNN. Based on our evaluation results, we would recommend using HiCNN2-1 and HiCNN2-3 if recovering more significant interactions from Hi-C data is of interest, and HiCNN2-2 and HiCNN if the goal is to achieve higher reproducibility scores between the enhanced Hi-C matrix and the real high-resolution Hi-C matrix.
Highlights
The population-cell Hi-C technique [1] can capture genome-wide intra- and inter-chromosomal contacts, which provide proximity information of the DNA and can be used to reconstruct the three-dimensional (3D) structures of chromosomes [2,3,4], define topologically associated domains (TADs) [5,6,7], and reveal significant genomic interactions [8,9]
These results indicate that (1) HiCNN2 consistently performs better than HiCNN and HiCPlus; (2) HiCNN2-1, an improved version of HiCNN, apparently achieves smaller mean squared errors and higher Pearson’s correlations than HiCNN; and (3) it is difficult to distinguish which method is better among the three HiCNN2 architectures as their performances are similar in terms of the MSE
HiCNN2 consists of three different architectures
Summary
The population-cell Hi-C technique [1] can capture genome-wide intra- and inter-chromosomal contacts, which provide proximity information of the DNA and can be used to reconstruct the three-dimensional (3D) structures of chromosomes [2,3,4], define topologically associated domains (TADs) [5,6,7], and reveal significant genomic interactions [8,9]. To experimentally obtain high-resolution (e.g., 5 kb) Hi-C data, researchers need to generate more than one billion paired-end reads [9], which may incur a high sequencing cost. Computational methods for resolution enhancement of Hi-C data are indispensable. In this research, we only focus on enhancing population-cell Hi-C data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.