Abstract

BackgroundHigh-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells.MethodsHere we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types.ResultsThese concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5).ConclusionWith the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Highlights

  • High-throughput sequencing Chromosome Conformation Capture (HiC) allows the study of DNA interactions and 3D chromosome folding at the genomewide scale

  • The data The output of a High-throughput sequencing Chromosome Conformation Capture (Hi-C) analysis is a list of paired genomic regions along the different chromosomes, which can be represented as a square matrix X, where Xij stands for the sum of read pairs matching in position i and position j, respectively

  • This gene-centric view is of particular interest for making Hi-C experiments a common ground for integrating multi-omics features, highlighting, in systems biology view, pathways and transcriptional programs regulated by the genome conformation

Read more

Summary

Introduction

High-throughput sequencing Chromosome Conformation Capture (HiC) allows the study of DNA interactions and 3D chromosome folding at the genomewide scale. Modern bioinformatics aims at integrating different omics data to shed light into the mechanisms of gene expression and regulation that give rise to different phenotypes, in order to understand the underlying molecular processes that sustain life and to intervene into these processes by developing new drugs [1, 2] when pathological changes occur [3, 4] In this context, the exploration of the 3D organisation of chromosomes in the nucleus of cells is of paramount importance for many cellular processes related to gene expression regulation, including DNA accessibility, epigenetic patterns, and chromosome translocations [5, 6]. This represents an effective complement of the traditional matrix-based representation, as for example produced by Juicer [12] or TADbit [13]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call