Abstract

Polyadenylation at the 3′-end is a major regulator of messenger RNA and its length is known to affect nuclear export, stability, and translation, among others. Only recently have strategies emerged that allow for genome-wide poly(A) length assessment. These methods identify genes connected to poly(A) tail measurements indirectly by short-read alignment to genetic 3′-ends. Concurrently, Oxford Nanopore Technologies (ONT) established full-length isoform-specific RNA sequencing containing the entire poly(A) tail. However, assessing poly(A) length through base-calling has so far not been possible due to the inability to resolve long homopolymeric stretches in ONT sequencing. Here we present tailfindr, an R package to estimate poly(A) tail length on ONT long-read sequencing data. tailfindr operates on unaligned, base-called data. It measures poly(A) tail length from both native RNA and DNA sequencing, which makes poly(A) tail studies by full-length cDNA approaches possible for the first time. We assess tailfindr’s performance across different poly(A) lengths, demonstrating that tailfindr is a versatile tool providing poly(A) tail estimates across a wide range of sequencing conditions.

Highlights

  • The poly(A) tail is a homopolymeric stretch of adenosines at the 3′-end of the majority of eukaryotic mRNAs

  • An R tool that estimates poly(A) tail length from individual reads directly from Oxford Nanopore Technologies (ONT) FAST5 raw data. tailfindr is able to estimate poly(A) tails from both RNA and DNA reads, including DNA reverse-complement reads containing poly(T) stretches. tailfindr uses the raw data without prior alignment as input, and estimates the length based on normalization with the read-specific nucleotide translocation rate

  • We validate the performance of tailfindr on a set of RNA and DNA molecules with defined poly(A) tail lengths. tailfindr operates on the output of widely used as well as the most recent ONT base-calling applications

Read more

Summary

INTRODUCTION

The poly(A) tail is a homopolymeric stretch of adenosines at the 3′-end of the majority of eukaryotic mRNAs. While these studies allowed a thorough understanding of poly(A) tail lengths throughout the transcriptome for the first time, they are technically restricted to a specific size of poly(A) tails depending on sample enrichment and sequencing strategy Most of these techniques rely on PCR amplification of the poly(A) tail region, which might lead to amplification artifacts that affect poly(A) length measurements as well as quantitative comparisons between long and short poly(A) tails (Hite et al 1996; Murray and Schoenberg 2008; Hommelsheim et al 2014). We validate the performance of tailfindr on a set of RNA and DNA molecules with defined poly(A) tail lengths. tailfindr operates on the output of widely used as well as the most recent ONT base-calling applications (flip-flop model)

RESULTS
DISCUSSION
Findings
MATERIALS AND METHODS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call