Abstract

MotivationQuantification of isoform abundance has been extensively studied at the mature RNA level using RNA-seq but not at the level of precursor RNAs using nascent RNA sequencing.ResultsWe address this problem with a new computational method called Deconvolution of Expression for Nascent RNA-sequencing data (DENR), which models nascent RNA-sequencing read-counts as a mixture of user-provided isoforms. The baseline algorithm is enhanced by machine-learning predictions of active transcription start sites and an adjustment for the typical ‘shape profile’ of read-counts along a transcription unit. We show that DENR outperforms simple read-count-based methods for estimating gene and isoform abundances, and that transcription of multiple pre-RNA isoforms per gene is widespread, with frequent differences between cell types. In addition, we provide evidence that a majority of human isoform diversity derives from primary transcription rather than from post-transcriptional processes.Availability and implementationDENR and nascentRNASim are freely available at https://github.com/CshlSiepelLab/DENR (version v1.0.0) and https://github.com/CshlSiepelLab/nascentRNASim (version v0.3.0).Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Introduction same segment ofDNA often serves as a template for multiple distinctFor about the last 15 years, most large-scale transcriptomic studies have relied on high-throughput short-read sequencing technologies as the RNA transcripts

  • SEC22C has 16 isoforms, which are merged into eight precursor RNA (pre-RNA) isoforms; SS18L2 has three isoforms, which are merged into two; and NKTR has 19 isoforms, which are merged into ten

  • Our observations are qualitatively similar to those from a number of previous studies reporting widespread, regulated alternative that share start (TSS) usage, often in a tissue-specific for Nascent RNA-sequencing data (DENR), the first fully vetted manner (Carninci et al, 2006; Forrest et al, 2014; Demircioglu et al, computational method—to our knowledge—to address the abundance 2019), some of which have argued for a primary role of transcription estimation problem at the level of pre-RNA isoforms, based on nascent relative to splicing (Pal et al, 2011; Reyes and Huber, 2018)

Read more

Summary

Introduction

Introduction same segment ofDNA often serves as a template for multiple distinctFor about the last 15 years, most large-scale transcriptomic studies have relied on high-throughput short-read sequencing technologies as the RNA transcripts. In species with available genome assemblies, these sequence reads are level of multiple isoforms for each gene, owing to alternative transcription generally mapped to assembled contigs, and the “read depth,” or start sites (TSSs), alternative polyadenylation and cleavage sites (PAS), average density of aligned reads, is used as a proxy for the abundance of and alternative splicing (Wang et al, 2008).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call