Abstract

BackgroundKnowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete picture of the regulatory landscape. Changes in the 3D structure have been implicated in diseases. While there exist approaches that attempt to predict the long-range chromatin interactions, they focus only on interactions between specific genomic regions — the promoters and enhancers, neglecting other possibilities, for instance, the so-called structural interactions involving intervening chromatin.ResultsWe present a method that can be trained on 5C data using the genetic sequence of the candidate loci to predict potential genome-wide interaction partners of a particular locus of interest. We have built locus-specific support vector machine (SVM)-based predictors using the oligomer distance histograms (ODH) representation. The method shows good performance with a mean test AUC (area under the receiver operating characteristic (ROC) curve) of 0.7 or higher for various regions across cell lines GM12878, K562 and HeLa-S3. In cases where any locus did not have sufficient candidate interaction partners for model training, we employed multitask learning to share knowledge between models of different loci. In this scenario, across the three cell lines, the method attained an average performance increase of 0.09 in the AUC. Performance evaluation of the models trained on 5C data regarding prediction on an independent high-resolution Hi-C dataset (which is a rather hard problem) shows 0.56 AUC, on average. Additionally, we have developed new, intuitive visualization methods that enable interpretation of sequence signals that contributed towards prediction of locus-specific interaction partners. The analysis of these sequence signals suggests a potential general role of short tandem repeat sequences in genome organization.ConclusionsWe demonstrated how our approach can 1) provide insights into sequence features of locus-specific interaction partners, and 2) also identify their cell-line specificity. That our models deem short tandem repeat sequences as discriminative for prediction of potential interaction partners, suggests that they could play a larger role in genome organization. Thus, our approach can (a) be beneficial to broadly understand, at the sequence-level, chromatin interactions and higher-order structures like (meta-) topologically associating domains (TADs); (b) study regions omitted from existing prediction approaches using various information sources (e.g., epigenetic information); and (c) improve methods that predict the 3D structure of the chromatin.

Highlights

  • Knowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete picture of the regulatory landscape

  • Applicable to information on long-range contacts facilitated by a 4C, 5C or a Hi-C experiment, we describe our pipeline and the corresponding computational experiments performed on data from a 5C experiment [18] that detects interactions between a group of transcription start site (TSS)-containing regions (TCRs [18]) and distal enhancers in the three cell lines GM12878, K562 and HeLa-S3

  • For each cell line, we built a separate classifier per TSS-containing region (TCR)

Read more

Summary

Introduction

Knowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete picture of the regulatory landscape. Studies have revealed a correlation between long-range chromatin interactions and the functional state of the cell, e.g., in [12] and more generally, cell-type specificity as evidenced by [11]. These long-range interactions comprise pairs of loci that are close in space, but not necessarily close in sequence. The spatial co-localization of different chromosomal regions (cis as well as trans) can be due to a mix of factors, for example specific, direct contacts between two loci, nonspecific binding as a result of the packing of the chromatin fibre or co-localization due to functional association or having the same subnuclear structure [13]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call