Abstract

BackgroundAccurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision.ResultsHere, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions’ representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database’s large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models.ConclusionOur approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak.

Highlights

  • Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements

  • Representation and processing of cis regulatory modules To apply our method, each Cis-Regulatory Modules (CRM) is first converted to a 3D tensor representation of the peaks it contains, where the X, Y, Z axes represent respectively genomic position, datasets of origin, and transcriptional regulator (TR) of interest (Fig. 1a)

  • The representations are viewed by the model through convolutional filters. They focus first on the correlations between datasets and between TRs, in a stacked multiview approach (Fig. 1b). This produces an encoded representation of the CRM, passed to a decoder attempting to reconstruct the original

Read more

Summary

Introduction

Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The decreasing cost of gene sequencing and other genomic assays localizing various regions of interest (epigenomic features, TF binding regions) has resulted in a wealth of experimental data from the broader scientific community as well as from large consortia (e.g., ENCODE [1]). This data has been collated in warehouses such as the GEO database [2] or ArrayExpress [3] to facilitate inference and functional annotation of genomic regions. We focus on improving CREs detection and characterization through better identification of TR binding locations.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call