Abstract
BackgroundAssays that are capable of detecting genome-wide chromatin interactions have produced massive amount of data and led to great understanding of the chromosomal three-dimensional (3D) structure. As technology becomes more sophisticated, higher-and-higher resolution data are being produced, going from the initial 1 Megabases (Mb) resolution to the current 10 Kilobases (Kb) or even 1 Kb resolution. The availability of genome-wide interaction data necessitates development of analytical methods to recover the underlying 3D spatial chromatin structure, but challenges abound. Most of the methods were proposed for analyzing data at low resolution (1 Mb). Their behaviors are thus unknown for higher resolution data. For such data, one of the key features is the high proportion of “0” contact counts among all available data, in other words, the excess of zeros.ResultsTo address the issue of excess of zeros, in this paper, we propose a truncated Random effect EXpression (tREX) method that can handle data at various resolutions. We then assess the performance of tREX and a number of leading existing methods for recovering the underlying chromatin 3D structure. This was accomplished by creating in-silico data to mimic multiple levels of resolution and submit the methods to a “stress test”. Finally, we applied tREX and the comparison methods to a Hi-C dataset for which FISH measurements are available to evaluate estimation accuracy.ConclusionThe proposed tREX method achieves consistently good performance in all 30 simulated settings considered. It is not only robust to resolution level and underlying parameters, but also insensitive to model misspecification. This conclusion is based on observations made in terms of 3D structure estimation accuracy and preservation of topologically associated domains. Application of the methods to the human lymphoblastoid cell line data on chromosomes 14 and 22 further substantiates the superior performance of tREX: the constructed 3D structure from tREX is consistent with the FISH measurements, and the corresponding distances predicted by tREX have higher correlation with the FISH measurements than any of the comparison methods.SoftwareAn open-source R-package is available at http://www.stat.osu.edu/~statgen/Software/tRex.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0894-z) contains supplementary material, which is available to authorized users.
Highlights
Assays that are capable of detecting genome-wide chromatin interactions have produced massive amount of data and led to great understanding of the chromosomal three-dimensional (3D) structure
An in-silico study Using an existing estimated structure [16] as the “gold standard”, we consider several scenarios that mimic various data resolutions. This 3D structure is selected as it is estimated from real data and as it depicts two topologically associated domains (TADs) (Fig. 3(a)), an important feature for gauging the relative performance of the methods subjected to the “stress test”
In addition to truncated Random effect EXpression (tREX), the following methods are subjected to the stress test: truncated Poisson Architecture Model (tPAM) [16], BACH [13], PASTIS [14], ShRec3D [10], and ChromSDE [9], with the first three being model-based methods like tREX, and the remaining two being optimization based
Summary
Assays that are capable of detecting genome-wide chromatin interactions have produced massive amount of data and led to great understanding of the chromosomal three-dimensional (3D) structure. The availability of genome-wide interaction data necessitates development of analytical methods to recover the underlying 3D spatial chromatin structure, but challenges abound. A number of analytical approaches have been proposed to recapitulate the underlying 3D structure, with most of them developed for Hi-C data. These approaches can generally be classified into optimization based and modeling based. ShRec3D [10] falls into this category, except that in the first step, the contact counts are converted to distances by not just applying the inverse relationship of the biophysical model, but by finding the “shortest path” connecting two nodes on a weighted graph. Many of the optimization methods are based on metric or non-metric multi-dimensional scaling to minimize the objective function [8, 10, 11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.