Abstract
BackgroundWith the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function.ResultsHere, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications.ConclusionsAs ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD.
Highlights
With the development of chromosomal conformation capturing techniques, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology
We developed an algorithm to group chromosomal fragments that have similar interaction profiles into clusters, which are used for detecting Topologically Associated Domains (TAD)
The elbow is regarded as the point where adding another cluster does not improve the quality of clustering much
Summary
With the development of chromosomal conformation capturing techniques, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. The knowledge of the high-order organization of chromosomes is useful for the understanding of genome folding, long-range gene interactions and regulations [2], DNA replication [3], and cellular functions [4, 5]. To gain better insights into the organization of the chromosomes in a cell, a technology called the chromosome conformation capture technique such as 3C [6], 4C [7, 8], 5C [9], and Hi-C [10] has been developed to determine spatial chromosomal interaction within a chromosome region, a chromosome or an entire genome.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.