Abstract

We present a deep-learning package named HiCNN2 to learn the mapping between low-resolution and high-resolution Hi-C (a technique for capturing genome-wide chromatin interactions) data, which can enhance the resolution of Hi-C interaction matrices. The HiCNN2 package includes three methods each with a different deep learning architecture: HiCNN2-1 is based on one single convolutional neural network (ConvNet); HiCNN2-2 consists of an ensemble of two different ConvNets; and HiCNN2-3 is an ensemble of three different ConvNets. Our evaluation results indicate that HiCNN2-enhanced high-resolution Hi-C data achieve smaller mean squared error and higher Pearson’s correlation coefficients with experimental high-resolution Hi-C data compared with existing methods HiCPlus and HiCNN. Moreover, all of the three HiCNN2 methods can recover more significant interactions detected by Fit-Hi-C compared to HiCPlus and HiCNN. Based on our evaluation results, we would recommend using HiCNN2-1 and HiCNN2-3 if recovering more significant interactions from Hi-C data is of interest, and HiCNN2-2 and HiCNN if the goal is to achieve higher reproducibility scores between the enhanced Hi-C matrix and the real high-resolution Hi-C matrix.

Highlights

  • The population-cell Hi-C technique [1] can capture genome-wide intra- and inter-chromosomal contacts, which provide proximity information of the DNA and can be used to reconstruct the three-dimensional (3D) structures of chromosomes [2,3,4], define topologically associated domains (TADs) [5,6,7], and reveal significant genomic interactions [8,9]

  • These results indicate that (1) HiCNN2 consistently performs better than HiCNN and HiCPlus; (2) HiCNN2-1, an improved version of HiCNN, apparently achieves smaller mean squared errors and higher Pearson’s correlations than HiCNN; and (3) it is difficult to distinguish which method is better among the three HiCNN2 architectures as their performances are similar in terms of the MSE

  • HiCNN2 consists of three different architectures

Read more

Summary

Introduction

The population-cell Hi-C technique [1] can capture genome-wide intra- and inter-chromosomal contacts, which provide proximity information of the DNA and can be used to reconstruct the three-dimensional (3D) structures of chromosomes [2,3,4], define topologically associated domains (TADs) [5,6,7], and reveal significant genomic interactions [8,9]. To experimentally obtain high-resolution (e.g., 5 kb) Hi-C data, researchers need to generate more than one billion paired-end reads [9], which may incur a high sequencing cost. Computational methods for resolution enhancement of Hi-C data are indispensable. In this research, we only focus on enhancing population-cell Hi-C data

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call