Zipper plot: visualizing transcriptional activity of genomic regions

Francisco Avila Cobos,Celine Everaert,Jasper Anckaert,Jo Vandesompele,Pieter Mestdagh,Katleen De Preter,Dries Rombaut,Pieter-Jan Volders

doi:10.1186/s12859-017-1651-7

Francisco Avila Cobos, Celine Everaert + Show 6 more

Open Access

https://doi.org/10.1186/s12859-017-1651-7

Copy DOI

Abstract

BackgroundReconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. Current state-of-the-art tools for long non-coding RNA (lncRNA) annotation are mainly based on evolutionary constraints, which may result in false negatives due to the overall limited conservation of lncRNAs.ResultsTo tackle this problem we have developed the Zipper plot, a novel visualization and analysis method that enables users to simultaneously interrogate thousands of human putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These include publicly available CAGE-sequencing, ChIP-sequencing and DNase-sequencing datasets. Our method only requires three tab-separated fields (chromosome, genomic coordinate of the TSS and strand) as input and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot.ConclusionUsing the Zipper plot, we found evidence of transcription for a set of well-characterized lncRNAs and observed that fewer mono-exonic lncRNAs have CAGE peaks overlapping with their TSSs compared to multi-exonic lncRNAs. Using publicly available RNA-seq data, we found more than one hundred cases where junction reads connected protein-coding gene exons with a downstream mono-exonic lncRNA, revealing the need for a careful evaluation of lncRNA 5′-boundaries. Our method is implemented using the statistical programming language R and is freely available as a webtool.

Highlights

Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task
We studied the distribution of the closest CAGE-seq peaks (FANTOM5 data) around the Transcription Start Site (TSS) of all mono-exonic and all multiexonic human long non-coding RNA (lncRNA) transcripts (21,102 and 90,508 respectively) (Fig. 3a–c) and found that 589 mono-exonic lncRNAs (2.8%) presented a CAGE-peak overlapping with the TSS and 6256 (29.7%) had a peak within a +/− 5 kb window
Using the Zipper plot we found evidence of transcription for a set of well-characterized lncRNAs and observed that fewer mono-exonic lncRNAs have CAGE peaks overlapping with their TSSs compared to multi-exonic lncRNAs

Summary

Introduction

Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. Current state-of-the-art tools for long non-coding RNA (lncRNA) annotation are mainly based on evolutionary constraints, which may result in false negatives due to the overall limited conservation of lncRNAs. The introduction of RNA-sequencing (RNA-seq) has revolutionized the field of molecular biology, revealing that up to 75% of the human genome is actively transcribed [1]. The introduction of RNA-sequencing (RNA-seq) has revolutionized the field of molecular biology, revealing that up to 75% of the human genome is actively transcribed [1] The majority of this transcriptome consists of so-called long non-coding RNAs (lncRNAs). Reconstructing accurate transcript models for these lncRNAs is a major challenge when processing RNA-seq data. Distinguishing single-exon fragments that represent independent transcriptional units from those that result from genomic DNA contamination or incomplete transcript assembly is not straightforward

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 2, 2017
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

Zipper plot: visualizing transcriptional activity of genomic regions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Author response: Dlk1-Dio3 locus-derived lncRNAs perpetuate postmitotic motor neuron cell fate and subtype identity
Ya-Ping Yen ... Ya-Yin Tsai
-
Ya-Ping Yen, et. al.Ya-Ping Yen ... Ya-Yin Tsai
24 Sep 2018
24 Sep 2018

MountainClimber Identifies Alternative Transcription Start and Polyadenylation Sites in RNA-Seq.
Ashley A Cass ... Xinshu Xiao
Cell Systems | VOL. 9
Ashley A Cass, et. al.Ashley A Cass ... Xinshu Xiao
18 Sep 2019
Cell Systems | VOL. 9

The lncRNA DEANR1 facilitates human endoderm differentiation by activating FOXA2 expression.
Wei Jiang ... Rui Liu
Cell Reports | VOL. 11
Wei Jiang, et. al.Wei Jiang ... Rui Liu
01 Apr 2015
Cell Reports | VOL. 11

Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data.
Zixiu Li ... Chan Zhou
Non-Coding RNA | VOL. 8
Zixiu Li, et. al.Zixiu Li ... Chan Zhou
13 Oct 2022
Non-Coding RNA | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Zipper plot: visualizing transcriptional activity of genomic regions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics