Abstract

Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools (https://github.com/bartongroup/2passtools), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.

Highlights

  • Understanding eukaryotic genomes requires knowing the DNA sequence and which RNAs are transcribed from it

  • We used four nanopore direct RNA sequencing (DRS) datasets generated from Arabidopsis seedlings [11] and four datasets generated from human cell lines [10]

  • Because these datasets are likely to contain novel splice junctions which do not appear in reference annotations, we simulated full-length reads using the Arabidopsis and human reference transcriptomes, AtRTD2 [23] and GRCh38 [24], respectively

Read more

Summary

Introduction

Understanding eukaryotic genomes requires knowing the DNA sequence and which RNAs are transcribed from it. RNA polymerase II is associated with multiple alternative RNA processing events that diversify the coding and regulatory potential of the genome. Alternative processing choices include distinct transcription start sites, the alternative splicing of different intron and exon combinations, alternative sites of cleavage and polyadenylation, and base modifications such as methylation of adenosines. Changes in RNA processing can reflect the reprogramming of gene expression patterns during development or in response to stress, or result from genetic mutation or disease. The identification and quantification of Parker et al Genome Biology (2021) 22:72 different RNA processing events is crucial to understand what genomes encode and the biology of whole organisms [2]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.