Abstract
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools (https://github.com/bartongroup/2passtools), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.
Highlights
Understanding eukaryotic genomes requires knowing the DNA sequence and which RNAs are transcribed from it
We used four nanopore direct RNA sequencing (DRS) datasets generated from Arabidopsis seedlings [11] and four datasets generated from human cell lines [10]
Because these datasets are likely to contain novel splice junctions which do not appear in reference annotations, we simulated full-length reads using the Arabidopsis and human reference transcriptomes, AtRTD2 [23] and GRCh38 [24], respectively
Summary
Understanding eukaryotic genomes requires knowing the DNA sequence and which RNAs are transcribed from it. RNA polymerase II is associated with multiple alternative RNA processing events that diversify the coding and regulatory potential of the genome. Alternative processing choices include distinct transcription start sites, the alternative splicing of different intron and exon combinations, alternative sites of cleavage and polyadenylation, and base modifications such as methylation of adenosines. Changes in RNA processing can reflect the reprogramming of gene expression patterns during development or in response to stress, or result from genetic mutation or disease. The identification and quantification of Parker et al Genome Biology (2021) 22:72 different RNA processing events is crucial to understand what genomes encode and the biology of whole organisms [2]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.