SRHiC: A Deep Learning Model to Enhance the Resolution of Hi-C Data.

Zhilan Li,Zhiming Dai

doi:10.3389/fgene.2020.00353

Abstract

Hi-C data is important for studying chromatin three-dimensional structure. However, the resolution of most existing Hi-C data is generally coarse due to sequencing cost. Therefore, it will be helpful if we can predict high-resolution Hi-C data from low-coverage sequencing data. Here we developed a novel and simple computational method based on deep learning named super-resolution Hi-C (SRHiC) to enhance the resolution of Hi-C data. We verified SRHiC on Hi-C data in human cell line. We also evaluated the generalization power of SRHiC by enhancing Hi-C data resolution in other human and mouse cell types. Results showed that SRHiC outperforms the state-of-the-art methods in accuracy of prediction.

Highlights

Chromatin three-dimensional (3D) structure is vital to biological processes (Cremer and Cremer, 2001; Bonev and Cavalli, 2016), such as genome replication, DNA mutation and repair, transcription and so on
We found that HiCNN showed much longer time for training than super-resolution high-throughput chromosome conformation capture (Hi-C) (SRHiC) and HiCPlus (Supplementary Table S1)
The training time required by HiCNN was nearly 17.6 times that of HiCPlus, and the time required by SRHiC was 2.9 times that of HiCPlus

Summary

Introduction

Chromatin three-dimensional (3D) structure is vital to biological processes (Cremer and Cremer, 2001; Bonev and Cavalli, 2016), such as genome replication, DNA mutation and repair, transcription and so on. The advent of the high-throughput chromosome conformation capture (Hi-C) technique makes it possible to measure all pair-wise interactions across the entire genome (Lieberman-Aiden et al, 2009). High-throughput chromosome conformation capture data is usually represented as a contact matrix Mn × n, where Mi,j indicates the number of observed interactions (read pair count) between genomic regions i and j. The size (e.g., 10 Kb) of each bin is called the resolution of Hi-C contact matrix. The linear increase of resolution requires a quadratic increase in the total number of sequencing reads. To address this issue, it is necessary to develop a computational method to predict high-resolution Hi-C contact maps from low-resolution Hi-C data

Methods

Results

Conclusion