Abstract

To better understand DNA’s 3D folding in cell nuclei, researchers developed chromosome capture methods such as Hi-C that measure the contact frequencies between all DNA segment pairs across the genome. As Hi-C data sets often are massive, it is common to use bioinformatics methods to group DNA segments into 3D regions with correlated contact patterns, such as Topologically associated domains and A/B compartments. Recently, another research direction emerged that treats the Hi-C data as a network of 3D contacts. In this representation, one can use community detection algorithms from complex network theory that group nodes into tightly connected mesoscale communities. However, because Hi-C networks are so densely connected, several node partitions may represent feasible solutions to the community detection problem but are indistinguishable unless including other data. Because this limitation is a fundamental property of the network, this problem persists regardless of the community-finding or data-clustering method. To help remedy this problem, we developed a method that charts the solution landscape of network partitions in Hi-C data from human cells. Our approach allows us to scan seamlessly through the scales of the network and determine regimes where we can expect reliable community structures. We find that some scales are more robust than others and that strong clusters may differ significantly. Our work highlights that finding a robust community structure hinges on thoughtful algorithm design or method cross-evaluation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call