Abstract

High-throughput chromosome conformation capture assays, such as Hi-C, have shown that the genome is organized into organizational units such as topologically associating domains (TADs), which can impact gene regulatory processes. The sparsity of Hi-C matrices poses a challenge for reliable detection of these units. We present GRiNCH, a constrained matrix-factorization-based approach for simultaneous smoothing and discovery of TADs from sparse contact count matrices. GRiNCH shows superior performance against seven TAD-calling methods and three smoothing methods. GRiNCH is applicable to multiple platforms including SPRITE and HiChIP and can predict novel boundary factors with potential roles in genome organization.

Highlights

  • The three-dimensional (3D) organization of the genome has emerged as an important layer of gene regulation in developmental processes, disease progression, and evolution [1,2,3,4,5,6]

  • GRiNCH has several properties that make it attractive for analyzing these count matrices: (1) matrix factorization methods including negative matrix factorization (NMF) have a “matrix completion” capability, which can be used to smooth noisy, sparse matrices; (2) the low-dimensional factors provide a clustering of the row and column entities that can be used to define chromosomal structural units; (3) the non-negativity constraint of the factors provide a parts-based representation of the data and is well suited for count datasets; and (4) GRiNCH can be applied to any count matrix measuring chromosomal interactions between genomic loci such as High-throughput chromosomal conformation capture (Hi-C), [37], SplitPool Recognition of Interactions by Tag Extension (SPRITE) [9], and HiChIP [38] datasets

  • NMF has been used for bias correction and dimensionality reduction of Hi-C data [39]; this approach is applicable to only symmetric matrices while GRiNCH implementation can be extended to handle asymmetric matrices

Read more

Summary

Introduction

The three-dimensional (3D) organization of the genome has emerged as an important layer of gene regulation in developmental processes, disease progression, and evolution [1,2,3,4,5,6]. Highthroughput 3C data captured from diverse biological contexts and processes has led to an improved understanding of DNA packaging in the nucleus, the dynamics of 3D conformation across developmental stages [10], and between normal and disease cellular states [4, 11]. Analysis of such datasets has shown that chromosomal regions preferentially interact with one another, giving rise to higher-order structural units such as chromosomal territories, compartments, and topologically associating domains (TADs) which differ in the. Accurate identification of TADs is an important goal for linking 3D genome organization to cellular function

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.